2

I would like to configure a CloudWatch alarm to:

  • sum the last 30 minutes of the ApplicationRequestsTotal metric once every 30 minutes
  • alarm if the sum is equal to 0

I have configured the custom CloudWatch ApplicationRequestsTotal metric to emit once every 60 seconds for my service.

I have configure the alarm as:

{
    "MetricAlarms": [
        {
            "AlarmName": "radio-silence-alarm",
            "AlarmDescription": "Alarm if 0 or less requests are received for 1 consecutive period(s) of 30 minutes.",
            "ActionsEnabled": true,
            "OKActions": [],
            "InsufficientDataActions": [],
            "MetricName": "ApplicationRequestsTotal",
            "Namespace": "AWS/ElasticBeanstalk",
            "Statistic": "Sum",
            "Dimensions": [
                {
                    "Name": "EnvironmentName",
                    "Value": "service-environment"
                }
            ],
            "Period": 1800,
            "EvaluationPeriods": 1,
            "Threshold": 0.0,
            "ComparisonOperator": "LessThanOrEqualToThreshold",
            "TreatMissingData": "missing"
        }
    ],
    "CompositeAlarms": []
}

I have set up many alarms like this and each one seems to:

  • sum the last 30 minutes of ApplicationRequestsTotal metric once EVERY minute

For example this service started getting 0 ApplicationRequestsTotal at 8:36a and right at 9:06a CloudWatch triggered an alarm.

CloudWatch Alarm seems to evaluate EVERY minute

The aws cloudwatch describe-alarm-history for the above time period:

{
    "AlarmName": "radio-silence-alarm",
    "AlarmType": "MetricAlarm",
    "Timestamp": "2021-09-29T09:06:37.929000+00:00",
    "HistoryItemType": "StateUpdate",
    "HistorySummary": "Alarm updated from OK to ALARM",
    "HistoryData": "{
       "version":"1.0",
       "oldState":{
          "stateValue":"OK",
          "stateReason":"Threshold Crossed: 1 datapoint [42.0 (22/09/21 08:17:00)] was not less than or equal to the threshold (0.0).",
          "stateReasonData":{
             "version":"1.0",
             "queryDate":"2021-09-22T08:47:37.930+0000",
             "startDate":"2021-09-22T08:17:00.000+0000",
             "statistic":"Sum",
             "period":1800,
             "recentDatapoints":[
                42.0
             ],
             "threshold":0.0,
             "evaluatedDatapoints":[
                {
                   "timestamp":"2021-09-22T08:17:00.000+0000",
                   "sampleCount":30.0,
                   "value":42.0
                }
             ]
          }
       },
       "newState":{
          "stateValue":"ALARM",
          "stateReason":"Threshold Crossed: 1 datapoint [0.0 (29/09/21 08:36:00)] was less than or equal to the threshold (0.0).",
          "stateReasonData":{
             "version":"1.0",
             "queryDate":"2021-09-29T09:06:37.926+0000",
             "startDate":"2021-09-29T08:36:00.000+0000",
             "statistic":"Sum",
             "period":1800,
             "recentDatapoints":[
                0.0
             ],
             "threshold":0.0,
             "evaluatedDatapoints":[
                {
                   "timestamp":"2021-09-29T08:36:00.000+0000",
                   "sampleCount":30.0,
                   "value":0.0
                }
             ]
          }
       }
    }"
}

What have I configured incorrectly?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
MrSuaveh
  • 144
  • 2
  • 13

2 Answers2

6

That is not how Amazon CloudWatch works.

When creating an Alarm in CloudWatch, you specify:

  • A metric (eg CPU Utilization, or perhaps a Custom Metric being sent to CloudWatch)
  • A time period (eg the previous 30 minutes)
  • An aggregation method (eg Average, Sum, Count)

For example, CloudWatch can trigger an Alarm if the Average of the metric was exceeded over the previous 30 minutes. This is continually evaluated as a sliding window. It does not look at metrics in distinct 30-minute blocks.

Using your example, it would send an alert whenever the Sum of the metric is zero for the previous 30 minutes, on a continual basis.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Thanks John, that's good to know the metric is continually evaluated. And as far as you know there isn't any way to change that correct? – MrSuaveh Sep 30 '21 at 17:00
  • There is no way to change this behaviour. – John Rotenstein Sep 30 '21 at 22:02
  • Thanks for confirming John, much appreciated. – MrSuaveh Sep 30 '21 at 22:15
  • @JohnRotenstein If this is the case, then what is that incremental unit called? For example, for a period of 30 mins, it evaluates the aggregation, and as it is continuously evaluating, so next evaluation happens at 31st minute, then this unit will have value of 1 minute. But, is there any term for this value? Also, if the period itself is 60 seconds, what would be its value? – Ankush Jain Dec 03 '22 at 17:33
0

I think that your answer can be found directly in the documentation that I'm going to link: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html

I'm gonna cite the docs: When you create an alarm, you specify three settings to enable CloudWatch to evaluate when to change the alarm state:

  • Period is the length of time to evaluate the metric or expression to create each individual data point for an alarm. It is expressed in seconds. If you choose one minute as the period, the alarm evaluates the metric once per minute.

  • Evaluation Periods is the number of the most recent periods, or data points, to evaluate when determining alarm state.

  • Datapoints to Alarm is the number of data points within the Evaluation Periods that must be breaching to cause the alarm to go to the ALARM state. The breaching data points don't have to be consecutive, but they must all be within the last number of data points equal to Evaluation Period.

When you configure Evaluation Periods and Datapoints to Alarm as different values, you're setting an "M out of N" alarm. Datapoints to Alarm is ("M") and Evaluation Periods is ("N"). The evaluation interval is the number of data points multiplied by the period. For example, if you configure 4 out of 5 data points with a period of 1 minute, the evaluation interval is 5 minutes. If you configure 3 out of 3 data points with a period of 10 minutes, the evaluation interval is 30 minutes.

afj88
  • 171
  • 1
  • 6