2

I have a problem with inhibition rules because of exceptions to inhibition rules.

For example, we have 50 teams, and every team handles their alerts. When a data center goes down (e.g. because of network problems), we want to inhibit all alerts except for Team_1234567890 and Team_ABCDEFGHIJ.

Problem is that Alertmanager doesn't support negative matchers for inhibition: Negative matchers for routing and inhibition #1023 - https://github.com/prometheus/alertmanager/issues/1023

Golang, and Prometheus/Alertmaneger, doesn't support "?!" negative look ahead in regex: https://github.com/google/re2/wiki/Syntax

How to setup inhibition rules for this example?

Thanks, Denis

Denis
  • 83
  • 2
  • 14

3 Answers3

1

Before negative match implemented in AM, you need add unique routes for those two teams. And inhibit other teams as normal.

Or, if you want to go with Silencer route, https://github.com/prometheus/alertmanager/blob/master/README.md#amtool

More detailed man page can be found here https://manpages.debian.org/testing/prometheus-alertmanager/amtool.1.en.html

You can add a silencer using amtool to snooze all alerts for the other 50-2 teams as soon as the first network down alert being triggered.

You DO need to be creative about when to insert / remove the Silencer.

Unless you already had a list of teams who don’t want to be alert-stormed, you DO need run a negative match PromQL to return those 48 team names and separate them by |,

amtool silence add alertname=~”.*” instance=~"team1|team2...”

Hang
  • 956
  • 6
  • 12
  • Hi, Thank you for your answer. Could you please provide an example for a solution with routes? As I understand, inhibitions are global (https://www.robustperception.io/laying-out-alertmanager-routes) so I guess I'm missing something. – Denis Jul 28 '20 at 13:18
0

Julien Pivotto (roidelapluie/Github) has written solution to this use case: https://github.com/prometheus/alertmanager/issues/1023#issuecomment-671851280

You could use prometheus

alerting:
  alert_relabel_configs:
  - source_labels: [team]
    regex: Team_1234567890|Team_ABCDEFGHIJ
    target_label: dc_team_alert
    replacement: "yes"

and inhibit

target_match:
   dc_team_alert: ""
Denis
  • 83
  • 2
  • 14
0

I had similar casus, it turned out that - job: "!(dev_mapr_alarms_exporters)" - for my specific case did the job. I was able to segregate these 2 groups. Here is a part of my config.

  routes:
- receiver: "jiralert"
  group_wait: 10s
  match_re:
    severity: critical|warning
    job: "!(dev_mapr_alarms_exporters)"
  group_by: ['alertname', 'job']
  group_interval: 5m
  repeat_interval: 30m
  continue: true
- receiver: "jiralert"
  group_wait: 10s
  match_re:
    job: dev_mapr_alarms_exporters
  group_by: ['alertname', 'job']
  group_interval: 5m
  repeat_interval: 30m
  continue: true
Martin Nikolov
  • 146
  • 1
  • 4