0

Consider the following case:

One has a kubernetes cronjob that runs on a schedule, say once every 30 seconds. This cronjob spins up a new pod, runs a python script that captures some measurements, and records them with the Otel Metrics SDK [ref] to be sent off to Prometheus. Particularly, the metrics are being recorded with a Histogram instrument and represent latency values in ms.

One might expect that the resulting metrics would compound accordingly - that is, the metricname_bucket, metricname_count, and metricname_sum would monotonically increase over time. However, with our aforementioned cronjob, this is not the case.

My current assumption is that because the pods are reinstantialized every runtime, and thus so too is the Otel instrument (and reader and exporter), that the metrics are essentially starting fresh. Which leads to the question - does an Otel Metrics instrumentation need to persist in order to properly record metrics? In other words, are otel instrumentations only applicable as part of always-on runtime platforms, like a web server?

Tried: Running a script as a k8s cronjob in which prometheus metrics are recorded via a Histogram Otel instrument.

Expected: Metrics behave as monotonically increasing gauges.

Actual: Metrics don't increase and instead "reset" every measurement.

Gary Lang
  • 1
  • 1
  • the question needs sufficient code for a minimal reproducible example: https://stackoverflow.com/help/minimal-reproducible-example – D.L Mar 09 '23 at 22:09

1 Answers1

0

does an Otel Metrics instrumentation need to persist in order to properly record metrics? In other words, are otel instrumentations only applicable as part of always-on runtime platforms, like a web server?

No, they don't persist (for many good reasons); they will most likely never. That's what the Prometheus python client does too. When a process terminates, all the counters are reset and start from zero. It is the query backend job to handle the resets. There is a start_time_unix_nano in protocol which indicates when the counter has started recording. For the same time series, if the backend sees a new start_time_unix_nano, it should interpret it as there was a process restart and handle it.

Srikanth Chekuri
  • 1,944
  • 1
  • 9
  • 19