11

If I use the following query

topk(5,sum(container_memory_usage_bytes{kubernetes_container_name=~".+", kubernetes_namespace=~".+"}) by (kubernetes_namespace,kubernetes_container_name))

it returns 5 results as epxected.

However with

topk(5,sum(irate(container_cpu_usage_seconds_total{kubernetes_container_name=~".+", kubernetes_namespace=~".+"}[20s])) by (kubernetes_namespace,kubernetes_container_name))

around 18 results are returned. Any idea why this happens? And what I need to change in the second query to get only the top 5?

Jorrit Salverda
  • 715
  • 1
  • 7
  • 17
  • If you need obtaining up to `k` time series with the maximum specific values, then take a look at `topk_*` functions from MetricsQL. For example, [topk_avg](https://docs.victoriametrics.com/MetricsQL.html#topk_avg) returns the top `k` series with the maximum averages on the selected time range. – valyala Oct 11 '21 at 11:17

4 Answers4

11

Had the same issue, I switched on "Instant" on the query and I got the correct amount back

5

Prometheus may return more than k time series from topk(k, ...) when building a graph in Grafana, since it independently selects top k time series with the maximum values per each point on the graph. Each point on the graph may have own set of top time series. So the final graph may contain more than k time series. There are the following solutions for this issue:

  • To set up instant query in Grafana. Then Grafana queries /api/v1/query endpoint instead of /api/v1/query_range endpoint. The /api/v1/query endpoint evaluates the query only at a single timestamp, so it consistently returns up to k time series from topk(k, ...).
  • To use one of topk_* functions from MetricsQL - PromQL-like query language from VictoriaMetrics project I work on. For example, topk_max(k, ...) returns up to k time series with the maximum values on the selected time range, while topk_last(k, ...) returns up to top k time series with the maximum values at the end of the selected time range.
valyala
  • 11,669
  • 1
  • 59
  • 62
2

Those are the same query from the topk standpoint, both should be returning no more than 5 results.

Would I be right in saying that you're not running this as a query, but actually as a graph? If so exactly which 5 do you want chosen?

brian-brazil
  • 31,678
  • 6
  • 93
  • 86
  • It's used in a graph in Grafana, indeed. I guess in this case I'd like to see the top 5 with highest average cpu usage. – Jorrit Salverda Aug 05 '16 at 07:49
  • 1
    Just found ticket https://github.com/prometheus/prometheus/issues/586 about this exact issue, so I guess there's no good solution at this time. – Jorrit Salverda Aug 05 '16 at 07:52
  • 2
    @brian-brazil: This solution is great - https://www.robustperception.io/graph-top-n-time-series-in-grafana. However, this only shows the last topk. what If I wanted to show the topk per avg_on_interval/timestamp over a timerange? – barakbd Oct 31 '19 at 00:09
1

The below solution may help everyone using Prometheus:

Formula: sum(irate(container_cpu_usage_seconds_total{kubernetes_container_name=~".+", kubernetes_namespace=~".+"}[20s])) by (kubernetes_namespace,kubernetes_container_name)

If we are querying for last 3 hours, calculate topk for the avg_over_time(formula[$__range:4h]).

topk(5,avg_over_time(sum(irate(container_cpu_usage_seconds_total{kubernetes_container_name=~".+", kubernetes_namespace=~".+"}[20s])) by (kubernetes_namespace,kubernetes_container_name)[$__range:4h]))

Then add the value to the existing Formula:

Final Formula: sum(irate(container_cpu_usage_seconds_total{kubernetes_container_name=~".+", kubernetes_namespace=~".+"}[20s])) by (kubernetes_namespace,kubernetes_container_name) + topk(5,avg_over_time(sum(irate(container_cpu_usage_seconds_total{kubernetes_container_name=~".+", kubernetes_namespace=~".+"}[20s])) by (kubernetes_namespace,kubernetes_container_name)[$__range:4h]))*0

Worked for me. Don't forget to multiply the topk result by 0.

  • I'm guessing this is related to the issue and article referenced in another answer : https://github.com/prometheus/prometheus/issues/586 and https://www.robustperception.io/graph-top-n-time-series-in-grafana but it would be really helpful to include an explanation of how this works – Spyros Mandekis Mar 28 '22 at 16:25