AWS: Instances and Reliability

Question

Short of creating ginormous instances, is there any way to either force instances to run on separate physical machines or detect how many physical machines are being used by multiple instances of the same image on Amazon Web Services (AWS)?

I'm thinking about reliability here. If I fool myself into thinking that I have three independent servers for fault tolerance purposes (think Paxos, Quicksilver, ZooKeeper, etc.) because I have three different instances running, but all three end up running on the same physical machine, I could be in for a very, very rude surprise.

I realize the issue may be forced by using separate regions, but it would be nice to know if there is an intra-region or even intra-availability-zone solution, as I'm not sure I've ever seen AWS to actually give me more than one availability-zone choice in the supposedly multi-choice pulldown menu when creating an instance.

OK, I appreciate the advice from the first two to answer my question, but I was trying to simplify the problem without writing a novel by positing 3 machines in one region. Let me try again - as I scale a hypothetical app stack up/outward, I'm going to both statically and dynamically ("elastically") add instances. Of course, any manner of failure/disaster can happen (including an entire data center burning to the ground due to, say, an unfortunate breakroom accident involving a microwave, a CD, and two idiots saying "oh yeah? well watch this!!!"), but by far the most likely is a hard machine failure of some sort, followed not too far behind by a dead port. Running multiple instances of the same type T on a single piece of virtualized hardware adds computation power, but not fault tolerance. Obviously, if I'm scaling up/out, I'm most likely going to be using "larger" instances. Obviously, if AWS' largest machine has a memory size M and a number of processors C, if I choose an instance with memory size m such that m > (M/2) or with a number of CPU size c such that c > (C/2), then I will guarantee my instances run on separate machines. However, I don't know what M_max and C_max are today; I certainly don't know what they will be a year from now, or two years from now, and so on, as Amazon buys Bigger Better Faster. I know this sounds like nitpicking and belaboring the point, but not knowing how instances are distributed or if there is a mechanism to control instance distribution means I can make genuine mistakes in assumptions either in calculating the effecting F+1 or 2F+1 using current distributed computing algorithms or evaluating new algorithms for use in new applications, sharding and locality decisions, minimum reserved vs. elastic instance counts for portions of the appstack that see less traffic, etc.

score 0 · Answer 1 · answered Mar 26 '14 at 22:44

You always have at least two availability zones per region, and that should work for high availability scenarios. Intra-az would not go very far on reliability, as a whole az may go down (unlikely, but possible).

If you absolutely must force "intra-az separate hardware", dedicated instances in different accounts would achieve that, but would cost more and would not be much better.

score 0 · Answer 2 · answered Mar 26 '14 at 23:28

Not only are there multiple availability zones (think separate data centers), within each region, you can also have servers split up into different regions (west coast, east coast, Europe etc).

As far as redundancy and reliability is concerned, your much better off spreading your work across AZ's and regions, then trying to figure out or ensure that instances within a single AZ are on the same piece of hardware.

AWS: Instances and Reliability

2 Answers2