0

I have kube-prometheus stack setup in my cluster. I need a rule that alerts me whenever a new node is created.

My cluster is in AWS EKS and a monitoring stack setup with these components,

Alertmanager, prometheus, prometheus-operator, node-exporter, grafana.

I got the rules if a node is down,

- alert: InstanceDown
  expr: |
    up == 0
  labels:
    severity: critical
  annotations:
    summary: “Instance [{{ $labels.instance }}] down”
    description: “Instance [{{ $labels.instance }}] down”

I need a rule that if a new worker node is created, We should get an alert,

I tried this query in Grafana to see if I'm getting any results. But when the cluster autoscaler created a new node, the query didn't return any results. I got this query from this case

(group by (kubernetes_io_hostname, kubernetes_io_role) (container_memory_working_set_bytes ) * 0 or group by (kubernetes_io_hostname, kubernetes_io_role) (delta ( container_memory_working_set_bytes[1m]))) == 1

I added the Prometheus rule, New nodes were created by cluster autoscaler but I didn't get any alerts,

     - alert: NewInstanceUP
       expr: |
         (group by (kubernetes_io_hostname, kubernetes_io_role)(container_memory_working_set_bytes ) * 0 or group by
 (kubernetes_io_hostname, kubernetes_io_role) (delta (
 container_memory_working_set_bytes[1m]))) == 1
       labels:
         severity: critical
       annotations:
         summary: “Instance [{{ $labels.instance }}] UP”
         description: “Instance [{{ $labels.instance }}] UP”
  • Would `up unless up offset 5m` suffice? It would create alert when `up` exists but it hasn't been existing 5 minutes ago. You can test this retrospectively in Grafana's explorer (or Prometheus' graph page), just set time range where new instances were created. – markalex Apr 12 '23 at 09:28
  • I tried this query in Grafana's explorer and it showed some data. But when I give this like a prometheus rule, I'm not getting any alerts, ##################################### - alert: Newnodecreated expr: | up unless up offset 5m ##################################### – Sebinn Sebastian Apr 13 '23 at 12:54
  • If you see results for this query in explorer, you should also get alerts. What `|` after `expr:` means? I'm not familiar with this construction, could it cause an error? Also, check `/alerts` page at your prometheus' web ui. Do you see this alert rule there? – markalex Apr 13 '23 at 12:59
  • Additionally, can you see any results for query `ALERTS{alertname="Newnodecreated"}`? – markalex Apr 13 '23 at 13:00
  • @ markalex I'm getting the alerts now. Thanks for the help – Sebinn Sebastian Apr 18 '23 at 11:03
  • What was the reason of the problem? – markalex Apr 18 '23 at 11:04

0 Answers0