(sorry in advance as i am a newbie in aws).
I am using a cloudformation stack to manage my ECS cluster.
Let's say we have an ASG with a desired capacity of 5 ec2 instances (minSize: 1, maxSize:7), and i am manually changing the value of the desired capacity from 5 to 2, it reduces the number of instances through the change set of a cluster, all instances are shutting down at once. It gives no time to dispatch back the previous container on the left instances. So, if going from 5 to 2 instances, all 3 instances are shut down directly. If by bad luck all the containers of one type were on the 3 machines, no container is existing anymore and the service is down.
is it possible to have a "cooldown" between each termination ? using a scaling policy won't obviously help since we do not want to setup a metric as the available metrics do not help in my case.
Please find hereunder some logs:
2021-01-15 15:45:52 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Rolling update initiated. Terminating 3 obsolete instance(s) in batches of 1, while keeping at least 1 instance(s) in service. Waiting on resource signals with a timeout of PT5M when new instances are added to the autoscaling group.
2021-01-15 15:45:52 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Temporarily setting autoscaling group MinSize and DesiredCapacity to 3.
2021-01-15 15:45:54 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Terminating instance(s) [i-X]; replacing with 1 new instance(s).
2021-01-15 15:47:40 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT5M.
2021-01-15 15:47:40 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Successfully terminated instance(s) [i-X] (Progress 33%).
2021-01-15 15:52:42 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Terminating instance(s) [i-X]; replacing with 1 new instance(s).
2021-01-15 15:53:59 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT5M.
2021-01-15 15:53:59 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Successfully terminated instance(s) [i-X] (Progress 67%).
2021-01-15 15:59:02 UTC+0100 dev-cluster UPDATE_ROLLBACK_IN_PROGRESS The following resource(s) failed to update: [autoScalingGroup].
2021-01-15 15:59:17 UTC+0100 securityGroup UPDATE_IN_PROGRESS -
2021-01-15 15:59:32 UTC+0100 securityGroup UPDATE_COMPLETE -
2021-01-15 15:59:33 UTC+0100 launchConfiguration UPDATE_COMPLETE -
2021-01-15 15:59:34 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS -
2021-01-15 15:59:37 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Rolling update initiated. Terminating 2 obsolete instance(s) in batches of 1, while keeping at least 1 instance(s) in service. Waiting on resource signals with a timeout of PT5M when new instances are added to the autoscaling group.
2021-01-15 15:59:37 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Temporarily setting autoscaling group MinSize and DesiredCapacity to 3.
2021-01-15 15:59:38 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Terminating instance(s) [i-X]; replacing with 1 new instance(s).
2021-01-15 16:01:25 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT5M.
2021-01-15 16:01:25 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Successfully terminated instance(s) [i-X] (Progress 50%).
2021-01-15 16:01:46 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Received SUCCESS signal with UniqueId i-X
2021-01-15 16:01:47 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Terminating instance(s) [i-X]; replacing with 1 new instance(s).
2021-01-15 16:03:34 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT5M.
2021-01-15 16:03:34 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Received SUCCESS signal with UniqueId i-X
2021-01-15 16:03:34 UTC+0100 autoScalingGroup UPDATE_IN_PROGRESS Successfully terminated instance(s) [i-X] (Progress 100%).
2021-01-15 16:03:37 UTC+0100 autoScalingGroup UPDATE_COMPLETE -
2021-01-15 16:03:37 UTC+0100 dev-cluster UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS -
2021-01-15 16:03:38 UTC+0100 launchConfiguration DELETE_IN_PROGRESS -
2021-01-15 16:03:39 UTC+0100 dev-cluster UPDATE_ROLLBACK_COMPLETE -
2021-01-15 16:03:39 UTC+0100 launchConfiguration DELETE_COMPLETE -
Thanks in advance for your help !