0

I have probe_success{job="my_service"} that is generated every 1 hour by blackbox exporter and it's either 0 or 1.

What's the correct expression to generate an alert?

Is probe_success{job="my_service"} == 0 correct?

enter image description here

sleekster
  • 15
  • 5

1 Answers1

0

Expression-wise probe_success == 0 or probe_success != 1 are valid choices.

However, you made it so that the metric appears only once in an hour which makes it impossible to use with the for alert parameter. In other words, your alert will only work without for and with that you will have all sorts of false alarms because of network glitches and other short random events.

If you want to use for, then you should set the scrape interval for the job to at least 5 minutes.

anemyte
  • 17,618
  • 1
  • 24
  • 45
  • Thanks, why 5 minutes though? could I settle on let's say 15m? – sleekster Jul 20 '23 at 07:53
  • I don't think I need the `for` though. I just want to make an alert once a `0` is seen. What would happen (by default) if there were let's say 3 consecutive `0` in a row? Would we see 3 alerts? – sleekster Jul 20 '23 at 07:57
  • 1
    @sleekster metrics disappear if not updated within 5 minutes (https://stackoverflow.com/a/71107940/11344502). With this setup you will have what you want, meaning that the alert will fire on any failure, but: 1) alert will resolve itself after 5 minutes; 2) short outages might go unnoticed (e.g. server crashed and was down for 55 minutes then went back online before the health check). – anemyte Jul 20 '23 at 08:39