2

I got the following Prometheus alert:

- alert: HostHighCpuLoadTSDB
        expr: (100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle",common_name="stg-tsdb-eks"}[2m])) * 100) > 5) and (hour() < 8 or hour() > 9)
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: Host high CPU load (instance {{ $labels.instance }})
          description: "CPU load is > 5%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

expression:

(100 - (avg by(instance) 
    (rate(node_cpu_seconds_total{mode="idle",common_name="stg-tsdb-eks"}[2m])) * 100) > 5) 
and (hour() < 8 or hour() > 9)

And each part works fine. For instance, when I create an alert rule with the expression:

100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle",common_name="stg-tsdb-eks"}[2m])) * 100) > 5

it creates alert.

When I create an alert rule with the expression: hour() < 8 or hour() > 9 - it creates an alert.

So, it means that both conditions are met: time in hour is less than 8 or more than 9, cpu load is more than 5%
But when I put it together: (100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle",common_name="stg-tsdb-eks"}[2m])) * 100) > 5) and (hour() < 8 or hour() > 9) - it doesn't create an alert.

Can someone point me in the right direction, what I'm doing wrong?

Prometheus version: v2.41.0
Alertmanager version: v0.25.0
I'm running it in the kubernetes cluster as kube-prometheus-stack deployed as Helm chart

I want to create an alert when the CPU load is more than 5% during any period of time, except the period between 8 and 9 hours(or even 10, it doesn't matter), during that period I don't want to trigger the alert. I'm testing it at let's say 12 o'clock and it still doesn't trigger the alert.

markalex
  • 8,623
  • 2
  • 7
  • 32
Oleksii K
  • 23
  • 3

1 Answers1

1

Problem is caused by and in your query. It tries to match left part with label set (instance) and right part with empty label set.

You should use and on() (hour() < 8 or hour() > 9) to resolve this.

Demo can be seen here.

Generally, if you want to create alert rule, I would suggest to go to Prometheus' GUI (something like localhost:9090/graph), paste you expression into query field, switch to graph tab and select appropriate time range (1w for example).

That way you'll have a chance to see when your alert would here in past, if it were created earlier.

Also, I believe you could use

unless on() hour()>=8 <=9

instead of composite and clause, to increase readability of your rule

markalex
  • 8,623
  • 2
  • 7
  • 32