0

I have Configured an alert for node memory usage in prometheus. My alert template is as follows:

- alert: NodeMemory Usage(development)
      annotations:
        description: '{{$labels.instance}} Memory usage is critical (current value is: {{ $value }})'
        summary: High Memory  usage detected
      expr: |
        1 - sum by(node) ((node_memory_MemFree{job="node-exporter"} + node_memory_Cached{job="node-exporter"} + node_memory_Buffers{job="node-exporter"}) * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:) / sum by(node) (node_memory_MemTotal{job="node-exporter"}* on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)  > 0.70
      for: 1s
      labels:
        severity: warning

I receives the name of the node with in the alert, when the threshold exceeds for a single node(node name here is nodes-3z4c), as follows:

[FIRING:1]  (NodeMemory Usage(development) nodes-3z4c monitoring/k8s warning)

Memory usage is critical (current value is: 0.7148033249432908)

But the issue is, when multiple nodes exceeds the threshold value, The name of the multiple nodes are not specifying in the alert notification and getting a notification as follows:

[FIRING:4] NodeMemory Usage (monitoring/k8s)
Memory usage is critical (current value is: 0.7319404231240473)
Memory usage is critical (current value is: 0.7856648253333621)

Can some one help me to figure out the issue?

manu thankachan
  • 433
  • 3
  • 9
  • 19

1 Answers1

0

This has nothing to do with how you defined the alert. If you look at it in the Alertmanager UI, you'll see that all labels are there.

It's either the template you use (if Alertmanager is sending the message directly) or whatever webhook handler you're using that's only keeping the common labels and dropping everything else.

Alin Sînpălean
  • 8,774
  • 1
  • 25
  • 29