2

I have configured prometheus alertmanager on Ubuntu server to monitor multiple azure vms. Currently all the vm instance alerts are notified to a default email group. I need to trigger alert to

  1. Team A(user1,user2,user3) & default group if Server A (using Jobname) goes down.
  2. Team B(User1,User2) & default group if server B goes down.

Tried few combinations with route configs given below in alertmanager.yml but it didn't work as expected.
Help appreciated if anyone can explain the logic behind the sending group specific alert notifications in alertmanager.
Thanks for you time!

route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 2h

  receiver: 'default-receiver'

  routes:
  - match:
      alertname: A_down
    receiver: TeamA
  - match:
      alertname: B_down
    receiver: TeamB

My current Alertmanager.yml file:

global:
 resolve_timeout: 1m

route:
 receiver: 'email-notifications'

receivers:
- name: 'email-notifications'
  email_configs:
  - to: alertgroups@example.com
    from: default@example.com
    smarthost: smtp.gmail.com:587
    auth_username: default@example.com
    auth_identity: default@example.com
    auth_password: password
    send_resolved: true

alertrule.yml file:

groups:
- name: alert.rules
  rules:
  - alert: InstanceDown
   # Condition for alerting
    expr: up == 0
    for: 1m
   # Annotation - additional informational labels to store more information
    annotations:
      title: 'Instance {{ $labels.instance }} down'
      description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.'
   # Labels - additional labels to be attached to the alert
    labels:
        severity: 'critical'

  - alert: HostOutOfMemory
   # Condition for alerting
    expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 80
    for: 5m
   # Annotation - additional informational labels to store more information
    annotations:
      title: 'Host out of memory (instance {{ $labels.instance }})'
      description: 'Node memory is filling up (< 25% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}'
   # Labels - additional labels to be attached to the alert
    labels:
        severity: 'warning'

  - alert: HostHighCpuLoad
   # Condition for alerting
    expr: (sum by (instance) (irate(node_cpu{job="node_exporter_metrics",mode="idle"}[5m]))) > 80
    for: 5m
   # Annotation - additional informational labels to store more information
    annotations:
      title: 'Host high CPU load (instance {{ $labels.instance }})'
      description: 'CPU load is > 30%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}'
   # Labels - additional labels to be attached to the alert
    labels:
        severity: 'warning'

  - alert: HostOutOfDiskSpace
   # Condition for alerting
    expr: (node_filesystem_avail{mountpoint="/"}  * 100) / node_filesystem_size{mountpoint="/"} < 70
    for: 5m
   # Annotation - additional informational labels to store more information
    annotations:
      title: 'Host out of disk space (instance {{ $labels.instance }})'
      description: 'Disk is almost full (< 50% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}'
Pratik M
  • 70
  • 2
  • 9

1 Answers1

0

Use this configuration:

  routes:
  - match:
      alertname: A_down
    receiver:
    - default-receiver
    - TeamA
  - match:
      alertname: B_down
    receiver: 
    - default-receiver
    - TeamB

Don't forget to define default-receiver, TeamA and TeamB using the "receivers" block.

  • Hi Marcelo, Thank you for your response. I perfectly understand your solution however I have a small query, alertrule.yml file is configured for all the targets added to prometheus, how can I specify the rule for a specific jobname to send email-alert to groups only if the particular target instance goes down? – Pratik M Sep 16 '21 at 07:06
  • Is this another question? You didn't mention anything about route using the jobname, did you? It's possible but first it's necessary to understand exactly what you want to accomplish. – Marcelo Ávila de Oliveira Sep 16 '21 at 10:31
  • No same question, apologies for the confusion caused, I have edited the question Title. Initial requirement is Prometheus alerts should send the alerts to default group (Monitoring team) + server specific team (server owners, selected leads) when the particular instance (jobname) meets the global rules defined in alertrule.yml. For example lets say server A goes down from listed number of servers. So default monitoring team and team associated with server A should get those server instance specific alerts. – Pratik M Sep 16 '21 at 11:11