3

I'm trying to figure out how Amazon Web Services Elastic Load Balancing would create no downtime.

Elastic Load Balancing pings your server path every so often (normally a couple of seconds). If it doesn't receive a response within a set period of time (normally a second or two) it will take the server offline and not send anymore traffic to that server until it comes back online.

What I'm confused about is although that server will be taken offline it will take a few seconds for AWS Elastic Load Balancing to ping it and it to actually be taken offline. I'm assuming there is a way to eliminate this gap of needing to ping and only send traffic to TRULY active servers and eliminate this chance of Elastic Load Balancing sending traffic to a server that is having issues. How can I achieve this and create 0 downtime in my application?

Charlie Fish
  • 217
  • 2
  • 9
  • 2
    It depends. Is this a web interface accessed by a browser, a web service (SOAP) an iPhone app, or something else? If you want 100% uptime / 100% reliability you're going to have to go to extremes in terms of effort and cost, and you can't ever guarantee 100%. You really need to determine your actual SLA - 99%, 99.9%, 99.99999%, etc. – Tim Jul 27 '16 at 00:49
  • @Tim Web service/application – Charlie Fish Jul 27 '16 at 01:31
  • @Tim I guess I should say like a browser application. Like a normal website. – Charlie Fish Jul 27 '16 at 01:40
  • 1
    Google, Facebook, Twitter, and Amazon all have had occasional downtime. If you think you can do better, well, have fun. 100% uptime is simply not realistic. – ceejayoz Jul 27 '16 at 02:00

1 Answers1

5

There is conflicting information about this online. Some resources say ELB retries a request if it goes past the default 60 second timeout before a response is received from the server, but these are in the minority. Some say ELB doesn't retry requests. The AWS documentation doesn't say what happens when an ELB times out - a fairly significant omission. Based on what I've read I tend to think that if your back end server times out the client is sent an error code, probably 408 timeout. You should test this, and my advice below is based on this assumption. If ELB retries than my advice below is incorrect.

I don't believe what you want is possible using ELB for a standard web application because of the lack of retries. Bigger picture, you can't guarantee 100% availability, it's virtually impossible. You need to set your availability to a realistic level then architect your system to achieve this. For example you might have two regions active, Route 53 doing geographic load balancing with failover. However you won't get 100% as it's set up to test and send requests to instances thought to be healthy, not to retry requests if they fail.

ELB won't retry a request if a server is down or times out. You would have to put in your own logic or load balancer, which itself could fail. Hardware outside of AWS might work but isn't a good idea, and your own load balancer inside AWS is a bad idea because you're unlikely to be able to create a load balancer as reliable as ELB.

I suggest you concentrate on making your web / application servers stable, scaleable, and stateless so they can be scaled up and down as required.

Tim
  • 31,888
  • 7
  • 52
  • 78
  • Ok so their is no way to setup ELB to check an instance before sending traffic to it? Then if the check fails mark it unhealthy and try the next one? Seems like a pretty simple thing on the surface. Eliminating that delay between an actual fail and a failed ELB test. But that isn't possible with ELB? – Charlie Fish Jul 27 '16 at 02:01
  • 1
    Test before request is impractical. Retry on failure would be practical, but based on my reading it's not implemented by ELB. However you could run a test to validate that on timeout ELB sends an error rather than retrying. However if your server sends back an error code there's no way ELB will retry to another server. – Tim Jul 27 '16 at 02:51
  • Retry the next server on failure or timeout would be the best option seems like then. Or something along those lines. Thanks for your help tho. – Charlie Fish Jul 27 '16 at 02:56
  • Just ran a test using ELB and it seems to be taking the server out of the rotation almost instantly. I tried setting up two servers with different content and if I shutdown the server it automatically takes it out of the rotation and only sends traffic to working servers. This process seems to happen instantly. – Charlie Fish Jul 27 '16 at 18:52
  • 1
    ELB has a configurable testing period, but if one request fails it probably does additional tests immediately. This isn't quite what you asked, you asked about 100% uptime, which was a fairly specific query. – Tim Jul 27 '16 at 19:06
  • Makes sense. I should have been clear. Sorry about that. Thanks for your help. – Charlie Fish Jul 27 '16 at 19:08
  • 1
    No problem, happy to help. – Tim Jul 27 '16 at 19:14