How do I track sporadic data with prometheus in nodejs?

Question

I am using prom-client in nodejs to publish a /metrics endpoint. I want to monitor sales of varying amounts which occur sporadically over time.

What is the best way to track a sporadic or discontinuous metric in prometheus? None of the existing metric types seem to be a good fit.

The basic prometheus metric type for tracking a single value (Gauge) is geared towards continuous data (such as CPU speed or concurrent requests).
The Histogram metric can capture discontinuous data, but requires manual percentiles and apparently only estimates quantiles (https://prometheus.io/docs/practices/histograms/#errors-of-quantile-estimation). Also the counts are wiped out when the metrics server restarts.
The Summary metric can capture discontinuous data, but is “in general not aggregatable” (https://latencytipoftheday.blogspot.com/2014/06/latencytipoftheday-you-cant-average.html).

Here is a simple setup with a Gauge, which obviously does not capture the

import express from 'express'
import promClient, { Gauge } from 'prom-client'

export const someMetric = new Gauge({
  name: 'some_metric',
  help: 'Track some metric; type = [a, b, c]',
  labelNames: ['one', 'two'],
})

const metricServer = express()
metricServer.get('/metrics', async (req, res) => {
  console.log('Metrics scraped')
  res
    .set('content-type', 'text/plain')
    .send(await promClient.register.metrics())
})

// intermittent callback that reports sales
service.onSale(value => {
  // this will simply overwrite the previous sale :(
  someMetric.labels('a', 'all').set(value)
})

metricServer.listen(9991, () =>
  console.log(` Prometheus listening on http://localhost:9991/metrics`)
)

My current plan is to create a new database to internally track a rolling 24-hr average of sales, and then expose that as a single continuous metric to prometheus. It seems awkward to keep a rolling average internally in addition to prometheus’s aggregation capabilities though.

I cannot comment on Prometheus as I didn't care for it and implemented influxDB. However, your current plan sounds reasonable if there's no explicit data type that helps you. I would consider not using an average but a moving sum. The average is a low pass filter and your "sporadic" data will look like a high frequency spike to the filter. — wbg, Nov 15 '21 at 18:52
why not aggregating over time? you can still use gauges for that even if they're not continuous. your intervals should just be large enough to capture the data you want. — juanecabellob, Nov 15 '21 at 20:31
@juanecabellob Aggregating with Prometheus or internally? Perhaps you can explain in a full answer. Thanks! — Raine Revere, Nov 15 '21 at 21:06
@RaineRevere before i do that, what exactly would you like to track the data for? alerts or plotting? — juanecabellob, Nov 15 '21 at 23:00

juanecabellob · Answer 1 · 2021-11-16T20:33:26.463

Without knowing what exactly is the purpose behind capturing this data, it's hard to tell whether a Gauge, Summary or Histogram would best fit your needs but I'll do my best with my assumptions. But first, let's just begin with a simplification of what Prometheus does and that may help visualize where I'm headed.

Prometheus is a time series database. That means, that every time your data gets scraped, it keeps at that given timestamp a snapshot of your metrics with their recorded values so in a very simplified version you end up with something like <timestamp, your_metric{label="1"} value>.

Assuming that what you want is to capture only the amount of money payed during a sale and you have finite number of customers, Gauges can help you see the paid amount at any given time differentiating any of the customers by label* (though, a counter would do just fine too).

Now, your question was about keeping track of the data. Plotting this shouldn't be an issue. Even though the data is not continuous, you'll see the data in any plotter, e.g. Grafana. Though, seeing dots (<timestamp, value of your metric for each label combination>) or small lines will not tell any story making them almost meaningless and will be hard to keep track of. What you could do to make this data continuous is to aggregate over time. Aggregating over time, allows you to instead of getting aggregated values at each timestamp, to get the aggregated values throughout your selected time window.

Let's try to visualize this:

Prometheus scrapes the data every 2 seconds. In 30 minutes, your gauge records 4 sales only. Two at minute 1 by two different customers and two at minute 20 from two different customers. If you plot this as is, you'll see 4 dots. If you aggregate this, e.g. by average, you'll see 2 dots at minute 1 and minute 20 containing the average of both sales.

If you'd be interested to see a continuous story, e.g. to see in a given time period what is the average sum of sales, you'd need to aggregate over time. The crucial difference: at any plotted point, you'd see the aggregated value between that timestamp and the selected time window. So, if you'd use on our example above avg_over_time instead of avg and you select your time window 30 minutes, you'd have 0 until minute 1, from minute 1 until minute 20 you'd see the average of the two sales that happened at minute 1, from minute 20 to minute 31 (30 minutes after the two sales from minute 1), you'd see the average from all 4 sales. Then, from minute 31 to minute 50 you'll see the average of the last 2 sales and then from minute 50 again 0. If you select a larger time windows, like 24 hours, you'd get the same effect. Just bear in mind that the larger this number is, the more computationally intensive is for Prometheus DB. Having a lot of labels* each with a high variance of values will make having such time windows very slow. The query for this would look like:

*I emphasize the importance of the cardinality of a metric: the more labels you add to a metric, the more entries prometheus has to go over to do calculations since for each label combination it will create a time-series.

Thanks for the detailed answer! What happens if I get two sales from the same customer within a single scrape window? If I use a Gauge, won't the first one get overwritten? I would have to record the sum of all sales within a scrape window, but then I would lose information about how many distinct sales were made. — Raine Revere, Nov 16 '21 at 20:21
i.e. I'm still fundamentally confused not about how to query the data in prometheus but about how to capture data from discontinuous events that may be numerous relative to the scrape window. It makes me think I should use a Histogram and not worry about being able to distinguish individual sale amounts. — Raine Revere, Nov 16 '21 at 20:22
Yes, it will be overwritten. In which case, you could either use a gauge and distinguishing different sales by some sort of ID that is unique but that may backfire in performance or a histogram but you may as you pointed out lose information as to getting individual information. Based on what you're saying, you may be better off with a Histogram. — juanecabellob, Nov 16 '21 at 20:32

How do I track sporadic data with prometheus in nodejs?

1 Answers1