I am trying to create alerts using Grafana for Spring Boot metrics scraped from Prometheus. The use-case is to alert for exceptions thrown from each service. I'm using the http_server_requests_seconds_count metric and mentioned below is the breakdown of the PromQL query I'm using to create the graphs.
First I'm excluding all the metrics which don't throw an exception.
http_server_requests_seconds_count{application="my-service-1",exception!~"None"}
Next I've applied the
rate()
function since the default metric just provides a monotonous value.rate(http_server_requests_seconds_count{application="my-service-1",exception!~"None"}[5m])
Then I've used the following condition to trigger an alert. (Using
max()
function as thesum()
and thecount()
functions take the data-points into consideration, which is not my requirement)WHEN max() OF query(A,5m,now) IS ABOVE 0.02
EVALUATE every 1m FOR 5m
The above setup works fine an sends a notification whenever the alert condition is met. However,I'm facing several problems with this approach.
- I need the actual count of exceptions instead of a rate
I've tried the following approach to solve this. But, it still gives a monotonous value unless there is any new exception is thrown.
count_over_time(http_server_requests_seconds_count{application="my-service-1",exception!~"None"}[5m])
- I'm getting several series for each exception and unless the alerting state has gone back to Ok, Grafana will not send a notification for a second time the condition is met from a different series.
How can I address the above issues and get Grafana to alert per new exception and also send the count instead of a rate?
Appreciate you kind help!