3

I am using kube-prometheus-stack and the yaml snippets you see below are part of a PrometheusRule definition.

This is a completely hypothetical scenario, the simplest one I could think of that illustrates my point.

Given this kind of metric:

cpu_usage{job="job-1", must_be_lower_than="50"} 33.72
cpu_usage{job="job-2", must_be_lower_than="80"} 56.89
# imagine there are plenty more lines here
# with various different values for the must_be_lower_than label
# ...

I'd like to have alerts that check the label must_be_lower_than and alert. Something like this (this doesn't work the way it's written now, just trying to demonstrate):

alert: CpuUsageTooHigh
annotations:
  message: 'On job {{ $labels.job }}, the cpu usage has been above {{ $labels.must_be_lower_than }}% for 5 minutes.'
expr: cpu_usage > $must_be_lower_than
for: 5m

P.S I already know I can define alerts like this:

alert: CpuUsageTooHigh50
annotations:
  message: 'On job {{ $labels.job }}, the cpu usage has been above 50% for 5 minutes.'
expr: cpu_usage{must_be_lower_than="50"} > 50
for: 5m
---
alert: CpuUsageTooHigh80
annotations:
  message: 'On job {{ $labels.job }}, the cpu usage has been above 80% for 5 minutes.'
expr: cpu_usage{must_be_lower_than="80"} > 80
for: 5m

This is not what I'm looking for, because I have to manually define alerts for some of the various values of the must_be_lower_than label.

igg
  • 2,172
  • 3
  • 10
  • 33

1 Answers1

1

There is currently no way in Prometheus to have this kind of "templating".

The only way to get something near would be to use recording rules that that define the maximum value for the label:

rules:
- record: max_cpu_usage
  expr: vector(50)
  labels:
    must_be_lower_than:"50"
- record: max_cpu_usage
  expr: vector(80)
  labels:
    must_be_lower_than:"80"
# ... other possible values

Then use it in your alerting rule:

alert: CpuUsageTooHigh
annotations:
  message: 'On job {{ $labels.job }}, the cpu usage has been above {{ $labels.must_be_lower_than}}% for 5 minutes.'
expr: cpu_usage > ON(must_be_lower_than) GROUP_LEFT max_cpu_usage
for: 5m
Michael Doubez
  • 5,937
  • 25
  • 39
  • I appreciate your answer! I already assumed that what I was asking for was a bit too ambitious, but good to have confirmation. This method you're presenting is interesting, but has more or less the same problem in that I still have to manually define the values of the label that should be alerted on. My backup plan was to use helm templates to define the alerts (same style as what I mention in the P.S.), but I will also consider this. Thanks! – igg Jul 13 '22 at 08:35
  • The only other way is to have somehow, something injecting the limits you want for your job. This is something you can do natively in node_exporter by using prom files. It would be nice if, by example, kube-state-metrics was able to transform annotations into metrics, we could use that to define limits. – Michael Doubez Jul 13 '22 at 14:31