0

I'm trying to set up an AWS Auto Scaling Group (ASG) that auto-scales based on average group CPU load.

I have a scale up policy that is supposed to scale the group up by 1 instance once the average CPU usage is higher than 70%. However when the alarm is triggered, the ASG launches several instances at the same time, which it shouldn'd do.

The relevant bits of CloudFormation configuration:

ECSScaleUpPolicy:
    Type: AWS::AutoScaling::ScalingPolicy
    Properties:
        AdjustmentType: "ChangeInCapacity"
        AutoScalingGroupName: !Ref ECSAutoScalingGroup
        PolicyType: "StepScaling"
        MetricAggregationType: "Average"
        EstimatedInstanceWarmup: 600
        StepAdjustments:
            -
                MetricIntervalLowerBound: "0"
                ScalingAdjustment: "1"

ECSScaleUpAlarm:
    Type: "AWS::CloudWatch::Alarm"
    Properties:
        AlarmDescription: "CPU more than 70% during the last minute."
        AlarmName: "ECSScaleUpAlarm"
        AlarmActions:
            -
                !Ref ECSScaleUpPolicy
        Dimensions:
            -
                Name: "ClusterName"
                Value: !Ref ECSCluster
        MetricName: "CPUReservation"
        Namespace: "AWS/ECS"
        ComparisonOperator: "GreaterThanOrEqualToThreshold"
        Statistic: "Average"
        Threshold: 70
        Period: 60
        EvaluationPeriods: 1
        TreatMissingData: "notBreaching"

As you can see, the scaling adjustment is just 1 and the instance warmup is quite long, it should wait for more time before launching the second instance :(

oblio
  • 1,519
  • 15
  • 39

1 Answers1

3

According to the documentation Policy type of Step scaling causes the group capacity to increase or decrease based on the size of the alarm breach. You need to change that to Simple scaling so that the capacity can be set based on a single adjustment.

Mahdi
  • 3,199
  • 2
  • 25
  • 35
  • It doesn't seem so: ```Simple scaling: execute policy when: ECSScaleUpAlarm breaches the alarm threshold: CPUReservation >= 70 for 3 consecutive periods of 60 seconds for the metric dimensions. Take the action: Add 1 instances And then wait: 200 seconds before allowing another scaling activity``` And yet, 2 instances launched immediately. – oblio Aug 03 '18 at 12:23