0

Following the scenario:

There is a service that runs 24/7 and a downtime is extremely expensive. This service is deployed on Amazon EC2. I am aware to the importance of deploying the application on two different availability zones and even in two different regions in order to prevent single points of failure. But...

My question is whether there are any additional configuration issues that may affect the redundancy of an application. I mean also to wrong configuration (for example wrong configuration of the DNS that will make it fail in case of a fail over).

Just to make sure I am clear - I am trying to create a list of validations that should be tested in order to ensure the redundancy of an application deployed on EC2.

Thank you all!

Community
  • 1
  • 1
gads
  • 436
  • 1
  • 5
  • 10

1 Answers1

2

Just as a warning, just because you put your services in two availability zones doesn't mean that you're fault tolerant.

For example, one setup I had was to have 4 servers on a load balancer with us-east-1a us-east-1b as the two zones. Amazon's outage a few months ago caused some outages with my software because the load balancers weren't working properly. They were still forwarding requests but the two dead instances I had in one of the zones were also still receiving requests. Part of the load balancer logic is to remove dead instances, but since the load balancer queue was backlogged those instances were never removed. In my setup there are two load balancers once in each zone, so all of the requests to one load balancer were timing out because there were no instances to respond to the request. Luckily for me, the browser retried the request with the 2nd load balancer so the feeds I had were still loading but were very very slow.

My advice is to make sure that if you choose to go with only two availability zones over two regions that you make sure your systems are not dependent on any part of another availability zone, not even the load balancers. For me, it's not worth the extra cost to launch two completely independent systems in different zones so I'm unable to avoid this problem again in the future. But if your software is critical to the point where losing the service for 1 hour would pay for the cost of running extra hardware then it's definitely worth the extra servers to set it up correctly.

I also recommend paying for AWS support and working with their engineers to make sure that your design doesn't have any flaws for high-availability.

Recap of the issue I discussed: http://aws.amazon.com/message/67457/

NateEag
  • 575
  • 6
  • 13
bwight
  • 3,300
  • 17
  • 21