0

We have a Kubernetes cluster with Prometheus installed (via the kube-prometheus-stack Helm chart). I have found that I can display many of the metrics in Prometheus, but there are some that never have any data.

Some examples (chosen randomly):

container_cpu_system_seconds_total
container_cpu_user_seconds_total

etcd_server_apply_duration_seconds

These have data:

container_cpu_usage_seconds_total

etcd_server_apply_duration_seconds_count
etcd_server_apply_duration_seconds_sum

Additionally, Prometheus (and Grafana) have auto-completion on the query line, so that when you start typing, it will show what matches. Neither application does this for any of the "data less" metrics. It's as though it doesn't know about them.

Is there a reason that these metrics are defined and yet nothing is ever collected for/from them?

Joseph Gagnon
  • 1,731
  • 3
  • 30
  • 63
  • Related, accepted answer here: https://stackoverflow.com/a/49126197/14072498 – Roar S. May 12 '23 at 19:12
  • The OP of that post indicated that they get a result of `0`. I get nothing at all. An empty query result. I will try this, although it seems very specific to CPU metric data. There are other metrics that don't appear (perhaps I'm wrong) to have anything to do with CPU or cAdvisor that also produce no data. – Joseph Gagnon May 12 '23 at 19:26
  • Lack of mentioned metrics means that they never where scraped. Look into documentation where "these metrics are defined": it could that there are some prerequisites for them to be exposed. Also, checkout metrics page itself, to see which metrics are present. – markalex May 12 '23 at 20:46
  • Prometheus stores metrics name just as it does all labels.... it uses the special name `__name__` (dunder-name-dunder). You can use this to PromQL metric names (and identify ones that have data and don't) e.g. `{__name__=~"container_cpu_.+"}` will query for all metrics whose name begins with `container_cpu`. This will help you confirm whether it "knows" about a specific metric. It **is** confusing trying to determine which Kubernetes ecosystem components provide which metrics but the `container_cpu` set are from Kubelet|cAdvisor and should be all or none. – DazWilkin May 12 '23 at 22:06
  • So I queried using the PromQL suggested by @DazWilkin and got a bunch of data, but only for metrics that were already producing data. The mentioned metrics above that return an empty result do not appear in the list. This is consistent with everything so far. The important question for me is: *why* is there no data? – Joseph Gagnon May 15 '23 at 11:58
  • I don't know where the `etcd_server` metrics originate. That the e.g. `container_cpu_system_user_seconds_total` has never been ingested while its companion `container_cpu_usage_seconds_total` is measured, suggests that [Kubelet/cAdvisor](https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md#prometheus-container-metrics) scraping is enabled and that (!?) specific metrics are being excluded. I use a managed Prometheus Operator and I'm less familiar with the Helm deployment but, does the Helm Chart's `values.yaml` have `relabel_configs` that `action: drop` these metrics? – DazWilkin May 15 '23 at 16:37
  • 1
    I added `container_cpu_.+_seconds_total` to a deployed (Managed) Prometheus solution. I receive ~same number of samples for each metric (which makes sense and suggests metrics are being explicitly excluded in your case). Additionally, these metrics have high (71) cardinality and so there's a likelihood that someone would be interested in limiting their ingress. – DazWilkin May 15 '23 at 16:41
  • `... does the Helm Chart's values.yaml have relabel_configs that action: drop these metrics?` I'll have to research that. I do not underestand what you mean by "cardinality" in this context. – Joseph Gagnon May 15 '23 at 18:03

0 Answers0