0

I have a job which I am interested in tracking the latency of with Prometheus, with source values like this:

timestamp | latency
-------------------
0         | 15ms
1         | 20ms
2         | 18ms
5         | 22ms
6         | 30ms
8         | 5ms

Currently I use a CloudWatch Insights query to get some statistics for this latency data. For example, for the above data, I can calculate the min, max, average, and standard deviations over the last 5 minutes:

| stats avg(interval), stddev(interval), min(interval), max(interval) by bin(5m)

And this gives me a nice little graph like this: cloudwatch

In Prometheus, currently I am using 2 counters:

  • duration_seconds_total
  • duration_count_total

Which allows me to calculate the average latency over 5 minutes:

rate(duration_seconds_total[5m]) / rate(duration_count_total[5m])

But what about max, min, standard deviation?

I know I could do this with a Gauge and max/avg/min_over_time but this would lose fidelity if I'm getting requests more often than my scrape interval.

Is the correct approach here to use a Histogram / Summary? Median then would be roughly p50 (assuming good bucket choices) but how would I calculate min, max, or standard deviation? Is this possible using Prometheus?

zpr
  • 2,886
  • 1
  • 18
  • 21

0 Answers0