I have defined some alerts with expressions that look like this:
sum(rate(some_error_metric[1m])) BY (namespace,application) > 10
sum(rate(some_other_error_metric[1m])) BY (namespace,application) > 10
...
The above alerts currently fire when any of our applications emit these metrics at a rate of more than 10 per minute.
Rather than hard-coding a threshold of 10, I want to be able to specify a different threshold for each application.
e.g. application_1
should alert at a rate of 10 per minute, application_2
should alert at a rate of 20 per minute, etc.
Is this possible without duplicating the alerts for each application?
This stackoverflow question: Dynamic label values in Promethues alerting rules suggests that it might be possible to achieve what I want using recording rules, however following the pattern suggested in the only answer to this question results in recording rules that Prometheus doesn't seem to be able to parse:
- record: application_1_warning_threshold
expr: warning_threshold{application="application_1"} 10
- record: application_2_warning_threshold
expr: warning_threshold{application="application_2"} 20
...