0

I need to create a prometheus query that bring the numbers of event in one hour and / on day

when run the following and run request I see that the number increased as expected , 5 api calls the counter raised in 5.

this is the basic query vector_component_sent_events_total

Now I tried the following to get the number of requests for one hour but I get wrong numbers (like 0.0.5 which is not the number of request that I send in 1 hour...)

I tried the following, what am I missing here?

sum(rate(vector_component_sent_events_total{component_id="ta", job="nt"}[1h]))

I got something like 0.004201679495327872 , but I send only 5 (or bit more) request in hour ... any idea?

https://vector.dev/docs/administration/monitoring/

PeterSO
  • 15
  • 9

2 Answers2

1

If you want an increase in your metric for 1 hour you could use this:

increase(vector_component_sent_events_total{component_id="ta", job="nt"}[1h])

or if you want an exact integer increase:

vector_component_sent_events_total{component_id="ta", job="nt"} - vector_component_sent_events_total{component_id="ta", job="nt"} offset 1h

with some drawback (violently incorrect near counter resets).

CAUTION: this is a hack and it could produce incorrect data in some circumstances.
In my personal experience the best way to get integer increase most of the time and not lose data near counter resets is to use queries like this:

vector_component_sent_events_total{component_id="ta", job="nt"} - vector_component_sent_events_total{component_id="ta", job="nt"} offset 1h >= 0
 or increase(vector_component_sent_events_total{component_id="ta", job="nt"}[1h])
markalex
  • 8,623
  • 2
  • 7
  • 32
  • thanks you, 1+ , when I use increase I see something like `15.0078165711308 ` , why, I expect it to be only `15` , this number of requests, any idea and how to avoid that? – PeterSO Mar 24 '23 at 07:35
  • @PeterSO, This behaviour is [documented](https://prometheus.io/docs/prometheus/latest/querying/functions/#increase) and you could not avoid it when using `increase`. I added a sketchy workaround to get "the best of both worlds": integer delta from `offset` and continuity from `increase`, but it should be used only for rarely resettable counters. – markalex Mar 24 '23 at 08:18
  • Sorry I didnt explain myself well, I need the most accurate data and not comprise on it, to verify, you are suggesting to use the `increase` and ignore the numbers after the dots, am I right? btw what are those number `.0078165711308` ? – PeterSO Mar 24 '23 at 08:55
  • @PeterSO, `increase` will produce **mostly** correct values **at any time** (with rather small discrepancies). These discrepancies are a result of the inner workings of `increase`: more about it you could read in the documentation linked above and [here](https://stackoverflow.com/questions/38665904/why-does-increase-return-a-value-of-1-33-in-prometheus). `offset` will produce **exact** values **most of the time**. – markalex Mar 24 '23 at 09:23
  • Thanks a lot, I would be great if you can edit the answer about what you recommended most as its a bit unclear as I understand that the last example could be not accurate – PeterSO Mar 24 '23 at 10:03
  • Those are three valid alternatives. And sadly, each of them has its own pitfalls. Anybody who sees this answer should decide what query to use by themselves. – markalex Mar 24 '23 at 10:09
  • As I need the `most accurate numbers` i've decided to use the following `floor(sum(increase(vector_component_sent_events_total{component_id="tra", job="ant"}[24])))` and it gives the exact numbers but didnt test it heavily, WDYT? – PeterSO Mar 24 '23 at 10:14
  • Use `round()` instead. – markalex Mar 24 '23 at 10:20
1

The sum(rate(m[d])) doesn't return the increase of m over the duration d. It returns the summary average increase rate over the lookbehind window d for all the counter metrics m:

  • The rate() function returns the average increase rate over the specified lookbehind window in square brackets.

  • The sum() function returns the sum over multiple input time series.

If you need obtaining the increase for counter metric over the last hour (see 1h in square brackets), then use the following query:

increase(vector_component_sent_events_total[1h])

This query uses increase() function for calculating the increase of the provided counter over the specified time range in square brackets. This function calculates the increase individually per each input time series.

If you need summary increase over the last hour for multiple time series, then wrap the query above into sum():

sum(increase(vector_component_sent_events_total[1h]))

Prometheus may return fractional or incomplete results from increase() over integer counter because of the following issues:

  • Extrapolation issue.
  • Prometheus ignores the increase between the last raw sample just before the lookbehind window passed to increase() in square brackets and the first raw sample inside the lookbehind window.

If you need exact m counter increase over some duration d, then m - (m offset d) query may work. But it returns incorrect results in the following cases:

  • When m counter resets to zero on the given lookbehind window d. This usually happens on the restart of the application, which exports the counter metric.
  • When some counter matching m appears inside the lookbehind window d. This usually happens when new entity with the associated counter appears in the application.

Prometheus doesn't provide working solution for these issues :(

P.S. If you still want consistently obtaining exact increase for counter metric over some lookbehind window, then try VictoriaMetrics - this is Prometheus-like monitoring solution I work on. Its' increase() function is free from issues mentioned above.

valyala
  • 11,669
  • 1
  • 59
  • 62
  • Thanks a lot!, we are using Prometheus, does it make sense to run Victoria metrics in addition to Prometheus ? I understand the pros, what are the cons beside we need to install additional helm chart and be responsible of the LCM... – PeterSO Mar 26 '23 at 10:59
  • It is possible to substitute Prometheus with VictoriaMetrics - see [these helm charts](https://github.com/VictoriaMetrics/helm-charts). – valyala Mar 27 '23 at 05:41