4

I have 2 services A and B which I want to monitor. Also I have 2 different notification channels X and Y in the form of receivers in the AlertManager config file.

I want to send to notify X if service A goes down and want to notify Y if service B goes down. How can I achieve this my configuration?

My AlertManager YAML file is:

route:
  receiver: X

receivers:
  - name: X
    email_configs:

  - name: Y
    email_configs:

And alert.rule files is:

groups:

- name: A
  rules:
    - alert: A_down
      expr: expression
      for: 1m
      labels:
         severity: critical
      annotations:
         summary: "A is down"

- name: B
  rules:
    - alert: B_down
      expr: expression
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: "B is down"
Janshair Khan
  • 2,577
  • 4
  • 20
  • 44

2 Answers2

10

The config should roughly look like this (not tested):

    route:
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 2h

      receiver: 'default-receiver'

      routes:
      - match:
          alertname: A_down
        receiver: X
      - match:
          alertname: B_down
        receiver: Y

The idea is, that each [`route`](https://prometheus.io/docs/alerting/configuration/#%3Croute%3E) field can has a `routes` field, where you can put a different config, that gets enabled if the labels in `match` match the condition.
renemadsen
  • 161
  • 1
  • 9
svenwltr
  • 17,002
  • 12
  • 56
  • 68
3

For clarifying - The General Flow to handle alert in Prometheus (Alertmanager and Prometheus integration) is like this:

SomeErrorHappenInYourConfiguredRule(Rule) -> RouteToDestination(Route) -> TriggeringAnEvent(Reciever)-> GetAMessageInSlack/PagerDuty/Mail/etc...

For example:

if my aws machine cluster production-a1 is down, I want to trigger an event sending "pagerDuty" and "Slack" to my team with the relevant error.

There's 3 files important to configure alerts on your prometheus system:

  1. alertmanager.yml - configuration of you routes (getting the triggered errors) and receivers (how to handle this errors)
  2. rules.yml - This rules will contain all the thresholds and rules you'll define in your system.
  3. prometheus.yml - global configuration to integrate your rules into routes and recivers together (the two above).

I'm attaching a Dummy example In order to demonstrate the idea, in this example I'll watch overload in my machine (using node exporter installed on it): On /var/data/prometheus-stack/alertmanager/alertmanager.yml

    global:
      # The smarthost and SMTP sender used for mail notifications.
      smtp_smarthost: 'localhost:25'
      smtp_from: 'JohnDoe@gmail.com'
    
    route:
      receiver: defaultTrigger
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 6h
      routes:
      - match_re:
          service: service_overload
          owner: ATeam
        receiver: pagerDutyTrigger
    
    receivers:
    - name: 'pagerDutyTrigger'
      pagerduty_configs:
      - send_resolved: true
        routing_key: <myPagerDutyToken>

Add some rule On /var/data/prometheus-stack/prometheus/yourRuleFile.yml

    groups:
    - name: alerts
      rules:
      - alert: service_overload_more_than_5000
        expr: (node_network_receive_bytes_total{job="someJobOrService"} / 1000) >= 5000
        for: 10m
        labels:
          service: service_overload
          severity: pager
          dev_team: myteam
        annotations:
          dev_team: myteam
          priority: Blocker
          identifier: '{{ $labels.name }}'
          description: 'service overflow'
          value: '{{ humanize $value }}%'

On /var/data/prometheus-stack/prometheus/prometheus.yml add this snippet to integrate alertmanager:

    global:
     
    ...
    
    alerting:
      alertmanagers:
      - scheme: http
        static_configs:
        - targets:
          - "alertmanager:9093"
    
    rule_files:
      - "yourRuleFile.yml"
    
    ...

Pay attention that the key point of this example is service_overload which connects and binds the rule into the right receiver.

Reload the config (restart the service again or stop and start your docker containers) and test it, if it's configured well you can watch the alerts in http://your-prometheus-url:9090/alerts

renemadsen
  • 161
  • 1
  • 9
avivamg
  • 12,197
  • 3
  • 67
  • 61