1

Please note that this question is about ELB itself, not EC2 instances behind ELB

Situation

We have experienced the following ELB issue recently:

  • 50% of requests were did not reach our backend and it seems that ELB itself too
  • ELB monitoring via AWS console didn't show anything unusual (zero ELB 4xx and ELB 5xx)
  • external checks verified that our backend EC2 instances were running well and could be reached

Our assumption is that EC2 instance that ELB is running on had connectivity issues. Ad hoc fix was to create new ELB (in front of the same set of our EC2 instances) and change DNS records.

Questions

  • is this something that can happen often
  • are there any tools that can detect this quickly enough (we always assume that this is our fault and only after a thorough checks we started to look at AWS)
  • is there a way to avoid this happening at all
  • Contact Amazon support. – Craig Watson May 12 '15 at 09:14
  • we will, but our price range means that this will be addressed in days, but we had to fix that ASAP and want to be able to avoid this in future. – Dmitry Mukhin May 12 '15 at 09:16
  • Assuming VPC: list all the subnets you have associated the ELB with -- *not* where the instances are, but actually associated to the ELB itself (they can be, and usually should be, different). Then, in the VPC console, find the route table associated with each of those subnets, and verify that the default route for all of the subnets is the igw-xxxxxxxx Internet Gateway object. If you have attached the ELB to any subnet whose default route is something other than the "igw," do not change the route -- remove the ELB those subnets. Please advise what you find. – Michael - sqlbot May 12 '15 at 21:58

1 Answers1

0

Route 53 health checks specifically support ELB instance health monitoring and failover.

Once enabled, Route 53 automatically configures and manages health checks for individual ELB nodes.

Route 53 DNS Failover is able to evaluate the health of the load balancer and the health of the application running on the EC2 instances behind it. In other words, if any part of the stack goes down, Route 53 detects the failure and routes traffic away from the failed endpoint.

https://aws.amazon.com/blogs/aws/amazon-route-53-elb-integration-dns-failover/

Basically this gets around the issue of individual ELB nodes not having a fixed IP, and the fact that it can be difficult to tell if your application or the ELB itself is failing.

You should be able to use this to failover to either a separate ELB in the same region, or to an entirely different region. You can set Route53 monitoring frequency as high as once every 10 seconds and the TTL on Route 53 Alias records is usually 60 seconds, which should give you some idea about how quickly failover will occur.

thexacre
  • 1,849
  • 13
  • 14