0

Problem in getting Nan values when monitoring summary metric of java application via prometheus and grafana when handling maxAgeSeconds

Hello everyone, I am using prometheus and grafana for monitoring my java application. I am writing a Summary metric of received message length. All I want to do is to show in grafana the maximum received message length within interval of 1 second (there can be multiple messages in 1 second as my application is real-time messaging gateway application). So at first I configured my prometheus server scrap_interval to be 1 second, for sampling data from my exported http server every 1 second.

In grafana, for getting the max value I just showed the metric when quantile is 1.

First problem I faced was that the value of maximum message length from previous intervals wasn’t changed until i get bigger value, so the maximum value within specific second is not shown if it is smaller than previous maximum values from previous intervals. My solution for that problem was changing the maxAgeSeconds of the Summary to be 1 (default is 10 minutes) and in that way- every 1 second the historical values were cleared, and I have succeeded to fetch the relevant maximum value every interval of 1 second.

After that I got a new problem- because of the clearing of the data every 1 second, when I send messages less frequently, i.e. 1 message every second, sometimes the prometheus sever “catch” Nan (null) value (Nan that comes right after the clear when getting to maxAgeSeconds).

So in a result, my visualisation in Grafana shows sometimes holes of message length of 0, respectively for the Nan values it catches.

I tried to figure out all over the internet how can I handle these Nan values, I want my visualization in Grafana to show real data, it is important for my real time java application to show every second the real maximum value of message length it got.

I want to emphasize that there can be a lot of scenarios of messaging frequency: multiple messages within 1 second, 1 message every 1 second, 1 message every minute, etc..

So I need a real solution that supports all that cases which handles this issue with Nan values.

  • Are you pushing or pulling metrics in regards of Prometheus? Are you sure that you **need** monitoring with resolution of 1 second? What is your actual query used in grafana panel? Also, it might be helpful, if you'll [edit] this question and add sample of related metrics. – markalex Jun 04 '23 at 20:47
  • 1
    I don’t understand your question about pulling or pushing-all i can say that my java application is running an http server with exposed port of metrics, the prometheus server is reading these metrics from the http server. I am sure that i need monitoring every second because it is a real time application, so i need to know every second what is happening with the max received message length. the query is received_message_length_bytes(quantile=1.0) – Gal Brilovich Jun 05 '23 at 20:55

0 Answers0