2

I am creating a Grafana dashboard to see the total alert count for each firing alert and the duration ( means it should capture how long alerts have been firing state).

PromQL query used to capture the total alert count is as follows,

count by (alertname,customerName) (changes(customer_ALERTS[24h]))

Idea is to add two more column in the Grafana table panel having the alert count and the duration

Now i need to get the query to capture the duration for each alerts. Can somebody please share some thoughts?

akhinair
  • 73
  • 1
  • 7

1 Answers1

2

If you know the evaluation interval for alerts, then the following PromQL query could be used for calculating the duration in seconds for alerts in firing state over the last 24 hours:

count_over_time(customer_ALERTS[24h]) * <evaluation_interval_in_seconds>

The query assumes that customer_ALERTS contains non-empty values when alert is firing and has no any values when the alert isn't firing. If the customer_ALERTS contains zero values when the alert isn't firing and one values when the alert is firing, then the following query should be used instead for determining the duration of alerts in firing state in seconds:

avg_over_time(customer_ALERTS[24h]) * 24 * 3600

If customer_ALERTS contains other values for firing / not firing state, then PromQL subqueries could be used for counting samples in firing state. Take a look also at MetricsQL functions such as lifetime(m[d]), share_gt_over_time(m[d], gt) or count_gt_over_time(m[d], gt).

valyala
  • 11,669
  • 1
  • 59
  • 62
  • Thanks. I used the second query and i see the value for all the alerts is 8600. `(avg_over_time(customer_ALERTS{alertstate="firing",severity="critical"}[24h])) *24 * 3600` any reason why the value is same? – akhinair Nov 12 '20 at 17:57
  • could you look at the graph for `customer_ALERTS{alertstate="firing",severity="critical"}` for the last day? The `avg_over_time()` query expects that the graph has no gaps and contains 0 values when the alert wasn't firing and 1 values when the alert was firing as outlined in the answer. – valyala Nov 13 '20 at 23:10