I have configured prometheus alertmanager on Ubuntu server to monitor multiple azure vms. Currently all the vm instance alerts are notified to a default email group. I need to trigger alert to
- Team A(user1,user2,user3) & default group if Server A (using Jobname) goes down.
- Team B(User1,User2) & default group if server B goes down.
Tried few combinations with route configs given below in alertmanager.yml but it didn't work as expected.
Help appreciated if anyone can explain the logic behind the sending group specific alert notifications in alertmanager.
Thanks for you time!
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
receiver: 'default-receiver'
routes:
- match:
alertname: A_down
receiver: TeamA
- match:
alertname: B_down
receiver: TeamB
My current Alertmanager.yml file:
global:
resolve_timeout: 1m
route:
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: alertgroups@example.com
from: default@example.com
smarthost: smtp.gmail.com:587
auth_username: default@example.com
auth_identity: default@example.com
auth_password: password
send_resolved: true
alertrule.yml file:
groups:
- name: alert.rules
rules:
- alert: InstanceDown
# Condition for alerting
expr: up == 0
for: 1m
# Annotation - additional informational labels to store more information
annotations:
title: 'Instance {{ $labels.instance }} down'
description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.'
# Labels - additional labels to be attached to the alert
labels:
severity: 'critical'
- alert: HostOutOfMemory
# Condition for alerting
expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 80
for: 5m
# Annotation - additional informational labels to store more information
annotations:
title: 'Host out of memory (instance {{ $labels.instance }})'
description: 'Node memory is filling up (< 25% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}'
# Labels - additional labels to be attached to the alert
labels:
severity: 'warning'
- alert: HostHighCpuLoad
# Condition for alerting
expr: (sum by (instance) (irate(node_cpu{job="node_exporter_metrics",mode="idle"}[5m]))) > 80
for: 5m
# Annotation - additional informational labels to store more information
annotations:
title: 'Host high CPU load (instance {{ $labels.instance }})'
description: 'CPU load is > 30%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}'
# Labels - additional labels to be attached to the alert
labels:
severity: 'warning'
- alert: HostOutOfDiskSpace
# Condition for alerting
expr: (node_filesystem_avail{mountpoint="/"} * 100) / node_filesystem_size{mountpoint="/"} < 70
for: 5m
# Annotation - additional informational labels to store more information
annotations:
title: 'Host out of disk space (instance {{ $labels.instance }})'
description: 'Disk is almost full (< 50% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}'