0

i want to achieve the specified SLI(Service Level Indicator) for our http endpoints using blackbox exporter for probing like the following indicators: 80% availability Latency less than 1s

For latency i figured i can use the query probe_http_duration_seconds > 1 but for availability i am not sure i am doing it correctly with quantile_over_time(0.80, probe_http_status_code)[1d] > 400. The condition greater than 400 is used to check for http errors because i assume the http status code above 400 is an error. Is this correct for my case if not please guide me. Thanks

sal
  • 33
  • 6

1 Answers1

1

If you want to calculate ratio of successful probes to number of all registered probes:

count_over_time((probe_http_status_code<400)[1d:])/count_over_time(probe_http_status_code[1d:])

If you want to find ratio of successful probes to number of all possible probes (assuming that some probes were not executed, for example if blackbox_exporter was down):

count_over_time((probe_http_status_code<400)[1d:])/1440

where 1440 is number of possible porbes within specified time range (1440 is a result of 1d / 1m, assuming scrape_interval is 1 minute, change according to your setup).

markalex
  • 8,623
  • 2
  • 7
  • 32