0

I am trying to count how many value == 0 in past one hour in prometheus and try to create the alert rules.

I come up with the rules count_over_time(instance==0 [1h])/count_over_time(instance)

I got error shows I have to follow Prometheus aggregator expression.

Not sure what's the reason behind.

Really appreciate your help.

qing zhang
  • 125
  • 1
  • 4
  • 13

1 Answers1

0

Pointing out some mistakes in your query:

  • instance==0 [1h]: Range selection is possible only on instant vector, and not an expression. i.e., instance[1h] is valid, but not the one mentioned. What you need here is a subquery, and would look something like (instance==0)[1h:1m] (choose your resolution).

  • count_over_time(instance): count_over_time takes a range vector, so can't use just instance here, which is an instant vector.

Now coming to your expected query, what I understand is you want to know what percentage of instance series turned out to be 0 in the past 1 hour and alert on it, for that I suggest taking help of for tag in defining alerts, for example:

groups:
- name: example
  rules:
  - alert: ExampleAlert
    expr: count(instance == 0)/count(instance) > 0.5
    for: 1h
    annotations:
        description: "Count of (instances==0) is >50% of instances for more than 1h."

Here if the ratio was > 0.5 (50%) for straight 1h, it would alert.

codesome
  • 81
  • 5