1

I am trying to create alerts using Grafana for Spring Boot metrics scraped from Prometheus. The use-case is to alert for exceptions thrown from each service. I'm using the http_server_requests_seconds_count metric and mentioned below is the breakdown of the PromQL query I'm using to create the graphs.

  • First I'm excluding all the metrics which don't throw an exception.

    http_server_requests_seconds_count{application="my-service-1",exception!~"None"}

  • Next I've applied the rate() function since the default metric just provides a monotonous value.

    rate(http_server_requests_seconds_count{application="my-service-1",exception!~"None"}[5m])

  • Then I've used the following condition to trigger an alert. (Using max() function as the sum() and the count() functions take the data-points into consideration, which is not my requirement)

    WHEN max() OF query(A,5m,now) IS ABOVE 0.02

    EVALUATE every 1m FOR 5m

The above setup works fine an sends a notification whenever the alert condition is met. However,I'm facing several problems with this approach.

  1. I need the actual count of exceptions instead of a rate

I've tried the following approach to solve this. But, it still gives a monotonous value unless there is any new exception is thrown.

count_over_time(http_server_requests_seconds_count{application="my-service-1",exception!~"None"}[5m])

  1. I'm getting several series for each exception and unless the alerting state has gone back to Ok, Grafana will not send a notification for a second time the condition is met from a different series.

How can I address the above issues and get Grafana to alert per new exception and also send the count instead of a rate?

Appreciate you kind help!

EGE
  • 11
  • 2

0 Answers0