I am using the common monitoring tools (Prometheus, cAdvisor, AlertManager), and I faced this issue that one of the servers firing each 30min containerCpuUsage
but unfortunately I do not know which container is this (I am guessing this is the cAdvisor itself, but the cpu usage is really low on it!!) so my first question is, is there any way to tell AlertManager - base on prometheus rules - to send also the container name?
(cAdvisor itself using more CPU than the other containers)
cadvisor-rule.yaml
- alert: ContainerCpuUsage
expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container CPU usage (instance {{ $labels.instance }})"
description: "Container CPU usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
I've tried {{ $labels.name }}
and {{ $labels.job }}
but not working.
so let's call the instance name is A and then there is a nginx & cadvisor container inside it. Monitoring tools are running on the other instance, how can I get container names into rules labels or if there is other way to do it!