30

When deciding between Counter and Gauge, Prometheus documentation states that

To pick between counter and gauge, there is a simple rule of thumb: if the value can go down, it is a gauge. Counters can only go up (and reset, such as when a process restarts).

They seem to cover overlapping use cases: you could use a Gauge that only ever increases. So why even create the Counter metric type in the first place? Why don't you simply use Gauges for both?

Jose Armesto
  • 12,794
  • 8
  • 51
  • 56

4 Answers4

35

From a conceptual point of view, gauge and counter have different purposes

  • a gauge typically represent a state, usually with the purpose of detecting saturation.
  • the absolute value of a counter is not really meaningful, the real purpose is rather to compute an evolution (usually a utilization) with functions like irate/rate(), increase() ...

Those evolution operations requires a reliable computation of the increase that you could not achieve with a gauge because you need to detect resets of the value.

Technically, a counter has two important properties:

  1. it always starts at 0
  2. it always increases (i.e. incremented in the code)

If the application restarts between two Prometheus scrapes, the value of the second scrape in likely to be less than the previous scrape and the increase can be recovered (somewhat because you'll always loose the increase between the last scrape and the reset).

A simple algorithm to compute the increase of counter between scrapes from t1 to t2 is:

  • if counter(t2) >= counter(t1) then increase=counter(t2)-counter(t1)
  • if counter(2) < counter(t1)then increase=counter(t2)

As a conclusion, from a technical point of view, you can use a gauge instead of a counter provided you reset it to 0 at startup and only increment it but any violation of contract will lead to wrong values.

As a side note, I also expect a counter implementation to use unsigned integer representation while gauge will rather use a floating point representation. This has some minor impacts on the code such as the ability to overflow to 0 automatically and better support for atomic operations on current cpus.

Michael Doubez
  • 5,937
  • 25
  • 39
  • 1
    Thanks for the response. Let me paraphrase your response to see if I got it. Prometheus doesn't really care which type of metric we send it. We choose `Counter` or `Gauge` depending on how we are going to use the metric, that way we can assume certain properties the properties. Similar to how we use a private method while programming: technically, it doesn't have to be private, but we do it to help others think about the code. Can we also say that we should use `Counter` when we want to aggregate its value? We won't add CPU usage, which is a `Gauge`, but we may want to add total requests. – Jose Armesto Nov 04 '19 at 10:47
  • 2
    Exactly. I might add that the [Textfile format](https://prometheus.io/docs/instrumenting/exposition_formats/#comments-help-text-and-type-information) exposes type information. Ex: `# TYPE http_requests_total counter`. This can help when discovering metrics exposed or could be instrumented for sanity checks. – Michael Doubez Nov 05 '19 at 14:54
  • "Can we also say that we should use Counter when we want to aggregate its value?" - no, depending on the meaning of the metrics and what you want to get out of it, you'll use different aggregation function (avg/min/max/...). The rule of thumb given in the documentation is sensible. – Michael Doubez Nov 05 '19 at 15:00
8

An astute observation in this regard is:

The feeling behind Gauge is that:

Gauge is appropriate Iff SUM operation on the measurements does not make sense for any time interval

For example if hubble space telescope is looking at the brightness of every star it observes in it's celestial sweep - the sum of temperatures - would produce no valuable information whatsoever.

Similarly for bank-balance. The SUM of your bank balance every day is not a meaningful indicator of wealth. So use gauge for this - avg over interval is available in gauge.


The rate() fn issue is just a technicality about the rate() fn than about gauge & counter.

The culprit is that rate() is over-smart in detecting reset. There appears to be no mathematical reason why simple-rate() cannot be done in gauge.

dsculptor
  • 341
  • 2
  • 8
6

For counters you care about how fast it is increasing, whereas for gauges you care about the actual value. While there can be gauges that (in theory) only go up, that doesn't make them counters.

brian-brazil
  • 31,678
  • 6
  • 93
  • 86
1

In your application you typically use a prometeheus client library, but you can also track metrics yourself and export them to your own http-metrics-endpoint ofcourse.

Let's say we made an application that is processing something and you want to export one metric about how many objects were processed in total (since the app has been started). Another metric could be how many threads are currently working on processing something. Both are integers we can export via a http-metrics-endpoint for prometheus. Looking like this:

myapp_processed_total 23
myapp_processing_current_threads 8

Since we know the first myapp_processed_total is counting only upwards we might want to declare it as a Counter. The second one myapp_processing_current_threads can move up and down to indicate how many threads are currently in use. We might want to declare it as a gauge.

On our http-metrics-endpoint these are simply "comments" or annotations with # TYPE:

# TYPE myapp_processed_total counter
myapp_processed_total 23
# TYPE myapp_processing_current_threads gauge
myapp_processing_current_threads 8

Prometheus or even more important another administrator can use that information to create meaningful dashboards on the collected data, read the annotations with mouse hovering, query with rate()-function on counters but not on gauges for example... If you do the metrics from scratch both are just numbers though. But it sure is good practice to add the #TYPE information (and the # HELP information too) for further usage of your metrics.

Subresults can be defined with labels/key-value-tags in curly brakets b.t.w. For example when we want to differentiate between how many objects are processed successfully and how many are processed and aborted due to an error we might want to do something like this:

# HELP myapp_processed_total some useful information here
# TYPE myapp_processed_total counter
myapp_processed_total{status="success"} 20
myapp_processed_total{status="failure"} 3

And another administrator might create meaningful dashboards and queries on your metrics data later.

To summarize:

  • yes both counters and gauges are just numbers in your application
  • but it helps along in further processing of your metric to specify the type
  • maybe use an existing prometheus client library
zeg
  • 432
  • 2
  • 9