1

I have a service in AWS which scales automatically between a small number of EC2 instances (let's say 4) and a larger number (dozens).
When the service is running on a small number of instances, it looks like it would make perfect sense to use Spread Placement Group to make sure these instances do not end up on the same rack. However, Spread Placement Group is currently limited to 7 instances per region, which seems to be an issue when scaling out.

What would be the best way to combine Spread Placement Group for running small number of instances with autoscaling to large numbers of instances?

One idea is to create two ASGs, one for the minimum number of instances I want to run in Spread Placement Gruop and one ASG which would scale from 0 to dozens of instances and would run them outside of Placement Group. This does seem complicated. Is there a simpler way to do it?

Andrew
  • 2,663
  • 6
  • 28
  • 50

1 Answers1

0

The use case for a spread placement group is:

"critical instances that should be kept separate from each other. ... a spread placement group reduces the risk of simultaneous failures that might occur when instances share the same racks."

But wouldn't an Auto Scaling Group spread over several AZs also achieve a similar end, without the constraints of a placement group? If the answer is no, then it would be interesting to understand the characteristics of the app that make it so.

There is a fundamental problem mixing an auto scaling group with a placement group (cluster or spread), in that the auto scaling group might not be able to launch an instance; either because there is no capacity (cluster) or the limit of 7 has been hit (spread).

You had two questions: What is the best way? Is there a simpler way to do it?

The best way is probably a multi-AZ ASG. And that would be simplest too. But there is also the Partition placement group, which is a mixture of cluster and spread. If that fits your application?

If you really do want to take advantage of spread placement group to give you the rack-based fault tolerance so that you will lose at most 1 node if a rack is lost, then I think your suggestion may work. But you will have the problem that the two ASGs won't co-ordinate; so you might need a bespoke solution to manage scaling across the two.

P Burke
  • 1,630
  • 2
  • 17
  • 31
  • Thanks. It is true, an ASG over several AZs does achieve a similar effect. Hoever, let's say I am running the service in 2 AZs, 2 instances per AZ. Within a single AZ I only have two instances which may end up on the same hardware. This means that the failure of a single hardware component is indistinguishable from AZ failure as both cause all instances in that AZ to be down. – Andrew Mar 25 '21 at 14:18