1

Disclaimer: I'm new to PromQL, and might have constructed a query that is completely wrong. I'm using PromQL as the built-in monitoring of Google Cloud does not produce expected results (see this post).

When I use PromQL for a custom metric in Google Cloud I'm facing several issues:

  • my query generates too many data points (most likely something wrong with the query below?)
  • when I zoom out, all data disappears, or some aggregates are gone. The counts go from 1k to 50 for some reason
  • when I use a 1m interval, the data disappears
  • it is unclear how user-based distribution metrics should be queried, the docs are missing examples (it seems 3 submetrics are generated: bucket, count and sum, but I have no clue on how to use them)

I mainly want to have a sum of log-based metrics that contains a number of API hits like:

hits: 14
hits: 56
...

an interval should just show the sum of these numbers.

The query:

sum by (dag_id) (
sum_over_time(logging_googleapis_com:user_metric_sum[30m])
)

Example

Zoomed in version: Zoomed in version

Data is gone when zooming out: enter image description here

Multiple data points when zoomed in: enter image description here

Sums are wrong when zooming out (note the y-axis changing): enter image description here

c69
  • 19,951
  • 7
  • 52
  • 82
Jonny5
  • 1,390
  • 1
  • 15
  • 41
  • I'm not familiar with gc-monitoring, but in Grafana, for example, you shouldn't do this `sum_over_time(logging_googleapis_com:user_metric_sum[30m])`: you should use just `sum by (dag_id) (logging_googleapis_com:user_metric_sum))`, and it will automatically aggregate data to show. Could it be the source of the problem? – markalex Apr 14 '23 at 08:06
  • `[30m]` means "the most recent 30 minutes of data`. As you change the graphs range (April 12th, 17th and 14th), you're changing the endpoint and so you're changing the timeframe over which data for the last 30-minutes is being collected. – DazWilkin Apr 14 '23 at 16:30
  • You could try (!?) `[1w:30m]` which would give you "for the last week(`1w`) with a resolution of 30-minutes (`30m`). – DazWilkin Apr 14 '23 at 16:33
  • `sum_over_time(some_metric[1w:30m])` then sums every 30-minute measurement for the past week. – DazWilkin Apr 14 '23 at 16:34
  • [subquery](https://prometheus.io/docs/prometheus/latest/querying/basics/#subquery)'ing is a relatively recent addition to PromQL. I discovered revently that [Google-Managed Prometheus doesn't appear to support PromQL `@` modifier](https://issuetracker.google.com/issues/275597713) – DazWilkin Apr 14 '23 at 16:37

0 Answers0