31

I'm considering exporting some metrics to Prometheus, and I'm getting nervous about what I'm planning to do.

My system consists of a workflow engine, and I'd like to track some metrics for each step in the workflow. This seems reasonable, with a gauge metric called wfengine_step_duration_seconds. My issue is that there are many thousands of steps across all my workflows.

According to the documentation here, I'm not supposed to programmatically generate any part of the name. That precludes, then, the use of names such as wfengine_step1_duration_seconds and wfengine_step2_duration_seconds, because the step names are programmatic (they change from time to time).

The solution, then is a label for the step names. This also presents a problem, though, because the documentation here and here cautions quite strongly against using labels with high cardinality. Specifically, they recommend keeping "the cardinality of your metrics below 10", and for cardinality over 100, "investigate alternate solutions such as reducing the number of dimensions or moving the analysis away from monitoring".

I'm looking at a number of label values in the low thousands (1,000 to 10,000). Given that the number of metrics otherwise won't be extremely large, is this an appropriate usage of Prometheus, or should I limit myself to more generic metrics, such as a single aggregated step duration instead of individual duration for each step?

Mark
  • 11,257
  • 11
  • 61
  • 97

2 Answers2

28

High-cardinality labels (e.g. labels with big number of unique values) aren't dangerous on their own. The danger is in the total number of active time series. A single Prometheus instance can handle up to ten millions of active time series according to https://www.robustperception.io/why-does-prometheus-use-so-much-ram when running on a host with >100GB of RAM.

An example: suppose the exported metric has a step_id label with 10K unique values.

If the metric has no other labels (e.g. if it is exported as wfengine_duration_seconds{step_id="...}), then it will generate 10K active time series (tiny value for Prometheus).

If the metric contains another label such as workflow_id with 100 unique values and each workflow has 10K unique steps, then the total number of exported time series skyrockets to 100*10K=1M. This is still pretty low number of active time series for Prometheus.

Now suppose that the app, which exports the metric, runs on 50 hosts (or Kubernetes pods). Prometheus stores the scrape target address in the instance label - see these docs. This means that the total number of active time series collected from 50 hosts jumps to 50*1M=50M. This number may be too big for a single Prometheus instance. There are other systems, which can handle such amount of active time series in a single-node setup, but they also have upper limit. It is just N times bigger (1 < N < 10).

So the rule of thumb is to take into account the number of active time series, not the number of unique values per a single label.

valyala
  • 11,669
  • 1
  • 59
  • 62
12

The guideline of staying under 100 cardinality for your biggest metrics presumes that you have 1000 replicas of your service, as that's a reasonably safe upper bound. If you know that everyone using this code will always have a lower number of replicas, then there's scope to have a higher cardinality in instrumentation.

Saying that, thousands of labels is still something to be careful with. If it's already tens of thousands, how long before it's hundreds of thousands? Long term you'll likely have to move this data to logs given the cardinality, so you may wish to do so now.

brian-brazil
  • 31,678
  • 6
  • 93
  • 86
  • If one didn't distinguish between these replicas with labels, though, then what difference does it make how many replicas there are? – Mark Sep 23 '17 at 12:06
  • The more I think about it, the more that limitation doesn't make sense (or I'm misunderstanding something). For example, for a hypothetical CPU usage metric, do you put the hostname in the metric name (programmatically), use a label (and therefore limit yourself to 10 or 100 servers) or not break out the metric per server at all (and therefore lose the ability to fix a broken server)? – Mark Sep 23 '17 at 14:41
  • Cardinaility is cardinality, whether it is in metrics or labels. – brian-brazil Sep 23 '17 at 15:30
  • 1
    So is Prometheus unsuited to monitoring more than 100 (or 10) machines? – Mark Sep 24 '17 at 00:39
  • 1
    A single Prometheus can monitor thousands to tens of thousands of machines, depending on the setup. – brian-brazil Sep 24 '17 at 06:56
  • Wouldn't this mean that some hypothetical "hostname" label would have a higher cardinality than 10 or 100, then? – Mark Sep 24 '17 at 13:46
  • The `instance` label is considered in these numbers. – brian-brazil Sep 25 '17 at 09:31
  • 1
    I'm not sure what you mean by that. Considered in what numbers? – Mark Sep 25 '17 at 14:01
  • 2
    @Mark I think the suggestion is that the cardinality of a metric should not exceed 10,000 or 100,000, _including_ the `instance` label (your hypothetical `hostname` label), but I get the strong impression that no one is quite sure what is safe or has never measured it – jberryman Jan 22 '18 at 15:54
  • @jberryman Aren't you way off? the suggestion is actually 100 not 10,000 or 100,000 – user1870400 Dec 29 '20 at 23:05