1

I have an ASG with OldestLaunchTemplate set as the termination policy. One step of our deployment builds the app, creates a new launch template, and sets the ASG to use that current launch template and completes. The following step scales out the ASG, waits for instances to become healthy, and then scales in the ASG. While this is happening I suspend termination so additional scaling actions do not effect the deploy.

Initially I was simply setting desired/max to 2x current desired and then dropping back down to previous desired. This worked, but occasionally left behind instances running the old LT because of how the scale out/in was effected by the ASG being multi-AZ. So I updated the logic to make sure scale out happened by a minimum multiple of the number of AZs so that each AZ would have at least one old and one new instance. This worked for a bit, but now I see it continuing to terminate instances with the latest LT instead of terminating all the instances with the older LT even though instances across AZs would remain in balance.

This should be basic ASG functionality but I'm clearly missing something? What else would cause the ASG to not terminate the oldest LTs each time?

1 Answers1

0

Figured out the issue here. For different environments we have different on-demand vs spot requirements. In one environment I put in place a change (not long ago; thank you git blame...) to ensure that at least one instance was on-demand in order to ensure the app was always available and wouldn't be terminated mid-test suite run.

Emphasis mine:

When an Auto Scaling group with a mixed instances policy scales in, Amazon EC2 Auto Scaling still uses termination policies to prioritize which instances to terminate, but first it identifies which of the two types (Spot or On-Demand) should be terminated. It then applies the termination policies in each Availability Zone individually.

It took a hot second for me to remember making that change and that of course the termination policy would yield to the one instance that was on-demand to meet those requirements even as it continued to age across deployments. I rolled that change back and instead updated from t2 to t3 instances and also made our spot allocation strategy a bit less price sensitive (from lowest price to price capacity optimized) which should ensure better availability without aging instances.