4

I have Prometheus with some alerting rules defined and I want to have statistic regarding the number of alerts fired by Prometheus.

I tried to count how many time an alert is fired with grafana but it doesn't work:

SUM(ALERTS{alertname="XXX", alertstate="firing"})

There is a way to count how many times an alert is fired?

shenobi
  • 103
  • 1
  • 2
  • 7

3 Answers3

11

Your query returns how many alerts are firing now, not how many times each alert was fired.

I've found this query to (mostly) work with Prometheus 2.4.0 and later:

changes(ALERTS_FOR_STATE[24h])

It will return the number of times each alert went from "pending" to "firing" during the last 24 hours, meaning it will only work for alerts that have a pending state in the first place (i.e. alerts with for: <some_duration> specified).

ALERTS_FOR_STATE is a newly added Prometheus-internal metric that is used for restoring alerts after a Prometheus restart. It's not all that well documented (not at all, actually), but it seems to work.

Oh, and if you want the results grouped by alert (or environment, or job, or whatever) you can sum the results by that label or set of labels:

sum by(alertname) (changes(ALERTS_FOR_STATE[24h]))

will give you how many times each alert fired across jobs, environments etc.

Alin Sînpălean
  • 8,774
  • 1
  • 25
  • 29
  • The good [article](https://tiantiankan.me/a/5cf5562daada82366d5adebd) about this topic. We have to add one because going from not existing to existing is not counted as a change in value for the purpose of changes. – Kirill K Oct 11 '19 at 08:05
3

Inspired by "Alin Sînpălean", I count the alerts of firing state and ignore the pending state.

  • Count current alerts:

    count(ALERTS{alertstate="firing"}) by(alertname)
    
  • Count how many times each alert has been triggered:

    # Add a rule
    groups:
    - name: recording_rules
      rules:
      - record: ALERTS_FOR_STATE:firing
        expr: ALERTS_FOR_STATE and ignoring(alertstate) ALERTS{alertstate="firing"}
    
    sum(changes(ALERTS_FOR_STATE:firing[1w]) + 1) by(alertname)
    
LeoHsiao
  • 196
  • 2
  • 6
-2

Your PromQL is correct. Keep in mind though that labels (including alertname) are case sensitive: perhaps this is the issue?