1

I'm trying to make a dashboard for disk space forecasting. I've got a prometheus like this:

predict_linear(
   (1-(disk_volume_available_bytes{instance=~"$server"} / disk_volume_total_bytes{instance=~"$server"}))[32d:1d],
   864000
) > 0.95

Which works well enough at cutting the list of disks to those that actually need attention. What I'd then like to do is have another query (either in the same panel or a different one - doesn't matter to me) that takes any disk identified from the previous list and get me the actual/observed metrics. Said another way, if a disk is forecasted to be above 95% full, I want both the forecast line as well as the actual usage data for that disk. And if it's forecasted to be below 95%, don't display anything for either the forecast or the actual.

Is this possible?

Ben Thul
  • 31,080
  • 4
  • 45
  • 68

1 Answers1

1

Here is an example that shows node_exporter_build_info for those instances, where CPU utilization is over 30% (0.3):

node_exporter_build_info # this is the metric you want to see filtered
and on (instance) # and the rest is the filter terms, you won't see this on the panel
((1 - avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))) > 0.3)

The tricky part here is to join metric series on some labels so that there is no many-to-one join on either side. In the example above the only unique label is instance, but in your case there might also be device or mountpoint, so you may need something like this:

the_metric_you_wanna_see
and on (instance, device, mountpoint)  # put here a list of unique labels
(predict_linear(
   (1-(disk_volume_available_bytes{instance=~"$server"} / disk_volume_total_bytes{instance=~"$server"}))[32d:1d],
   864000
) > 0.95)

Also, since the query in question is rather expensive to compute and you need to repeat it once or twice, I suggest making Prometheus pre-calculate it: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

anemyte
  • 17,618
  • 1
  • 24
  • 45
  • I just had a chance to test this out and it's not quite what I'm looking for. What it looks like is happening is "display series B when series A exceeds the threshold" where what I want is "display series B if series A ever exceeds the threshold". But I feel like this is a step in the right direction and gives me another avenue to explore. Thanks! – Ben Thul Nov 30 '22 at 19:44
  • @BenThul it is possible if `ever` means dashboard time range. In this case you can make that recording rule I suggested and then make a bunch of dashboard variables, where you extract `label_values(your_new_recording_rule, label_A)` for each important label. Let me know if you need more details on how to do that, I'll expand my answer. – anemyte Nov 30 '22 at 22:24