7

I want to check if a certain metric is not available in prometheus for 5 minute.

I am using absent(K_KA_GCPP) and giving a 5 minute threshold. But it seems I cannot group the absent function on certain labels like Site Id.

Absent works if the metric is not available for all 4 site Ids. I want to find out if the metric is not available or absent for 1 site id out of all 4 and I don't want to hardcode the site Id labels in the query, it should be generic. Is there any way I can do that?

Jan Garaj
  • 25,598
  • 3
  • 38
  • 59
Arnav Bose
  • 791
  • 4
  • 13
  • 27

4 Answers4

4

I was able to achive this by doing somthing like this:

count(up{job="prometheus"} offset 1h) by (project) unless count(up{job="prometheus"} ) by (project)

If the metric is missing in the last 1 hour, it will trigger an alert. You can add any labels you need after the by section (that's helpful in altering for example).

Source: Prometheus Alert for missing metrics and labels

Ahmed AbouZaid
  • 2,151
  • 1
  • 13
  • 9
2

The offset I feel like is a great starting point, but it has a big weakness. If there's no sample in the time - offset then your query doesn't return what you'd like to.

I reworked the answer from Ahmed to this:

group(present_over_time(myMetric{label1="asd"}[3h])) by (labels) unless group(myMetric{label1="asd"}) by (labels)
  • using period with present_over_time should fix that aforementioned problem
  • group() aggregation, since you don't need the value
  • also I like to use the actual metric, since up{} is a state of the scraped target, not the "metric is present" information which I feel might not be equivalent
strudelPi
  • 51
  • 5
  • +1 @strudelPi. The only thing I modified - I extended 3h window (I made it 48h for my alert), cause this window defines how long your expression will be detecting the problem. If metric is still missing for a period of time more than this window length, expression will stop producing the outcome you rely on. `(group(present_over_time(st_Worker_Queue_Size_Max{deployment="worker"}[48h])) by (deployment, namespace)) unless group(st_Worker_Queue_Size_Max{deployment="worker"}) by (deployment, namespace)` – Ilya May 03 '23 at 01:12
1

There exists the Prometheus absent_over_time function

0

You can use it as a group! see how to configure an alert rule group

enter image description here

You can also use absent_over_time function

enter image description here

absent returns just one result as it is for a single site ID in your case

 absent(<expr>)

Returns an empty vector if the vector passed to it has any elements and a 1-element vector with the value 1 if the vector passed to it has no elements. This is useful for alerting on when no time series exist for a given metric name and label combination.

Ali.Ghodrat
  • 3,348
  • 3
  • 31
  • 31