3

I am deploying an AutoScalingGroup with AutoScalingPolicies (ScaleUp & ScaleDown) triggered by CloudWatch Alarm (CPU > 70%, CPU < 10%).

AutoScaling is working great but...Once the AutoScalingGroup has reached the minimal number of instance (2), the CPU < 10% alarm stays in ALARM STATE for hours...days...without resetting to OK STATE.

Because the CPUs utilization stay under 10%, I know it is normal the alarm never back to OK STATE.

I know it exists some AlarmActions like:

arn:aws:automate:${AWS::Region}:ec2:recover (for EC2)

I searched for similar Cloudwatch actions, did not find anything.

I have a custom solution: using a Lambda to change the Alarm State to OK but I would like to know if a smarter/easier solution exists.

Does anybody know how to do that?

Thanks.

k0nezzstark
  • 156
  • 1
  • 10

2 Answers2

3

Sounds like what you need is the ability to aggregate alarms with an AND clause. Alarm if CPU < 10% AND instance_count > 2. Unfortunately CloudWatch does not allow you to combine alarms like that directly.

The current solution to this problem is to use Metric Math to create a metric that meets your criteria and then alarm on that.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Create-alarm-on-metric-math-expression.html

Here's the list of available functions:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html#metric-math-syntax

You will have to work out the Math to see if this is possible for you.

CPU+10+(-10*CEIL((instance_count-2)/<MAX_ALLOWED_INSTANCE_COUNT>))

Matt Urtnowski
  • 2,556
  • 1
  • 18
  • 36
-1

You can also subscribe a Lambda function to the SNS topic that resets the alarm:

import boto3

# Create CloudWatch client
cloudwatch = boto3.client('cloudwatch')

# Reset the testalarm to OK
def resetAlarmState(event,context):
    response = cloudwatch.set_alarm_state(
        AlarmName='testalarm',
        StateValue='OK',
        StateReason='Resetting to OK'
    )
  • The [documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudwatch.html#CloudWatch.Client.set_alarm_state) says that the set_alarm_state method is for testing purposes only, and that the alarm will return to its previous state. – FoxMulder900 Jun 21 '21 at 22:44