my goal is to observe metrics (like CPU, Memory usage etc.) with Prometheus on a server and on its running docker containers. Before sending an alarm, I would like to compare the certain values of those metrics with e.g. an 0.95 quantile. However, over several weeks of search in the internet I still struggle to create metrics for the certain quantiles. Therefore I ask in this thread for your help/advice, how a quantile for certain metrics can be created.
Background
The code base is a fork of the docprom repository. This code relies on Prometheus for monitoring. Prometheus retrieves its data from a running cAdvisor
container. The provided metrics of cAdvisor for Prometheus can be seen on the following page. However, it provides only Gauge
and Counter
metric types. During my research I was not able to find parameters that would enable modifications/extensions of those provided metrics.
Problem
According to my current understanding, the metric type should be a Histogram
or Summary
in order to observe the quantiles. What is the best approach to use the histogram_quantile
query on the metrics provided by cAdvisor?
My current idea is to
- create a custom server
- fetch the desired data from Prometheus
- calculate the desired data
- provide it as a metric from the server, so that Prometheus can scrape it
- Run
histogram_quantile
on the custom metric
Is it the right approach in order to create a metric that can be used with quantiles?
For example I would like to fire an alarm if a certain containers' CPU usage exceeds a 0,95 quantile. The code for the CPU usage can be seen exemplary below:
sum(rate(container_cpu_usage_seconds_total{name="CONTAINER_NAME"}[10m]))) / count(node_cpu_seconds_total{mode="system"}) * 100
What would be the best approach to create the desired quantiles? Am I on the right path or am I missing something simple here? Because it looks way too hard for me in order to get a simple query with a quantile.
I am thankful for all help and information.