I have a Prometheus alert rule for CPU utilisation for my pods set like below for apps in a specific namespace and if it cross 90% percentage trigger an alert(removed some details specific but yes its almost like below) and its triggering the PagerDuty alert.
- alert: CpuUtilizationWarning
expr: avg by (kubernetes_io_zone) (rate( container_cpu_usage_seconds_total{ pod=~"app.*", container="app", namespace="app"}[5m] )) / on() group_left() avg( kube_pod_container_resource_requests_cpu_cores{ pod=~"app.*", container="app", namespace="apps } ) * 100 > 90
for: 5m
labels:
severity: warning
service: location
product: backend-app
annotations:
description: 'CPU utilization of {{ $labels.product }} pod is exceeding {{ $labels.severity }} and value is {{ $value }} %'
summary: '{{ $labels.severity }} CPU utilization of {{ $labels.product } pod
and alerts are coming like below in PD, sharing the sample alerts description that got triggered in PD.
Labels:
- alertname = CpuUtilizationWarning
- monitor = prometheus
- product = backend-app
- service = location
- severity = warning
- kubernetes_io_zone = us-east-1c
Annotations:
- description = CPU utilization of pod is exceeding and value is 95.099865567118 %
- summary = CPU utilization of pod
But as you can see from the above PD annotation of description
and summary
the custom labels {{ $labels.product }}
and {{ $labels.severity }}
is not interpolating the values in description and summary part in PD alert that got triggered.
It should be like below in PD:
Annotations:
- description = CPU utilization of backend-app pod is exceeding warning and value is 95.099865567118 %
- summary = waring CPU utilization of backend-app pod
That is it should add backend-app
in place of {{ $labels.product }} and warning
in place of {{ $labels.severity }}.
I've checked the logs, alertmanager, and alerts tab couldnt find any issues on why label values are not getting picked up.
Any suggestions on why PD is not picking up custom labels.