I got the following Prometheus alert:
- alert: HostHighCpuLoadTSDB
expr: (100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle",common_name="stg-tsdb-eks"}[2m])) * 100) > 5) and (hour() < 8 or hour() > 9)
for: 5m
labels:
severity: warning
annotations:
summary: Host high CPU load (instance {{ $labels.instance }})
description: "CPU load is > 5%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
expression:
(100 - (avg by(instance)
(rate(node_cpu_seconds_total{mode="idle",common_name="stg-tsdb-eks"}[2m])) * 100) > 5)
and (hour() < 8 or hour() > 9)
And each part works fine. For instance, when I create an alert rule with the expression:
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle",common_name="stg-tsdb-eks"}[2m])) * 100) > 5
it creates alert.
When I create an alert rule with the expression: hour() < 8 or hour() > 9
- it creates an alert.
So, it means that both conditions are met: time in hour is less than 8 or more than 9, cpu load is more than 5%
But when I put it together: (100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle",common_name="stg-tsdb-eks"}[2m])) * 100) > 5) and (hour() < 8 or hour() > 9)
- it doesn't create an alert.
Can someone point me in the right direction, what I'm doing wrong?
Prometheus version: v2.41.0
Alertmanager version: v0.25.0
I'm running it in the kubernetes cluster as kube-prometheus-stack deployed as Helm chart
I want to create an alert when the CPU load is more than 5% during any period of time, except the period between 8 and 9 hours(or even 10, it doesn't matter), during that period I don't want to trigger the alert. I'm testing it at let's say 12 o'clock and it still doesn't trigger the alert.