Questions tagged [prometheus-alertmanager]

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

Source code is available here: https://github.com/prometheus/alertmanager

753 questions
2
votes
0 answers

alertmanager getting randomly error message unexpected status code 422

I have deployed prometheus from community-helm chart(14.6.0) where is running alertmanager which is showing time-to-time errors (templating issues) with error message showing nothing extra useful. Question is that i have retested config via amtool…
2
votes
1 answer

Prometheus is throwing "bad response status 401 Unauthorized" - Even afer specificying right configs "basic_auth_users"

I've configured user & prod ("basic_auth_users") and passed those parameters as mentioned in the doc: --web.config.file Able to access Prometheus UI and Alert Manager UI independently(with provided credentials) but I'm seeing the following error in…
Avis
  • 496
  • 1
  • 5
  • 18
2
votes
1 answer

Prometheus server reload rules automatically

I'm using the prometheus-community helm chart to deploy prometheus in my cluster. I know that is possible to configure a custom service discovery for discovering new targets dynamically, this process does not requires a reload operation in…
2
votes
1 answer

absent alert getting triggered always in prometheus/alertmanager HA setup

We are switching to regionally sharded prometheus setup and using below AM setup to dedup duplicate alerts: https://github.com/prometheus/alertmanager#high-availability The deduping seems to be working fine, but absent alerts are causing issues. We…
dganesh2002
  • 1,917
  • 1
  • 26
  • 29
2
votes
2 answers

How to Change Port Number in cadvisor and node-exporter of Prometheus Monitoring

Below is the showing in prometheus URL when i click on target. cadvisor (0/1 up) and node-exporter (0/1 up) are showing in Prometheus URL Here is the my filename.yml file version: '3.2' services: prometheus: image:…
2
votes
0 answers

Prometheus Alertmanager: is dynamic alert name possible

Can we have dynamic alert names for Alertmanager alert rule? Such as: - alert: "Dependency Status - {{ $labels.serviceName }} unhealthy " expr: 100*avg_over_time(dependency_status{}[30m]) < 90 for: 1m labels: severity:…
Ken Tsoi
  • 1,195
  • 1
  • 17
  • 37
2
votes
1 answer

How to use label names for prometheus alert expression?

I want to send Prometheus alert if latency is beyond some 100s and based on the Severity label that I passed to gauge metric in instrumentation. I tried the following - alert: TestAppLatency expr: LATENCY>100 or {{ $labels.SEVERITY }} ==…
2
votes
0 answers

Prometheus Alert Manager Wrong Resolved Notification Sent

For an alert condition, Prometheus is behaving incorrectly. The alert gets generated based on the condition and goes to different notification channels correctly. The problem is - after some time the alert resolves (at random time gaps, sometimes…
Arnav Bose
  • 791
  • 4
  • 13
  • 27
2
votes
3 answers

How to monitor disk space usage for Kafka Brokers in AWS MSK cluster

We need to Monitor disk space usage for Kafka Brokers running in AWS MSK cluster. There're several metrices emitted by Kafka which can be used to monitor various aspects. But I was unable to find any specific metric that monitors "Disk Usage" for…
2
votes
1 answer

Alert manager triggers web-hook repeatedly for same alert

I have configured the alert manager rule to trigger alert when Prometheus metric changes from 0 to 1 It triggers a webhook alert upon metric changed from 0 to 1 But alert manager keeps triggering webhook, duplicate alerts for the same metric…
HariHaravelan
  • 1,041
  • 1
  • 10
  • 19
2
votes
0 answers

AlertManager queue is dropping messages

I have been using alertmanager to send alerts to pagerduty, and after a while I started getting the following error: level=warn ts=2021-04-02T10:43:01.239Z caller=delegate.go:272 component=cluster msg="dropping messages because too many are queued"…
Ido Trumer
  • 21
  • 1
2
votes
0 answers

Prometheus alert rules - check if a label exists in the message field

An example of the rule I want to append with a label if that label exists. - alert: TargetDown annotations: message: '{{`{{`}} printf "%.4g" $value {{`}}`}}% of the {{`{{`}} $labels.job {{`}}`}}/{{`{{`}} $labels.service {{`}}`}}…
Eli Halych
  • 545
  • 7
  • 25
2
votes
1 answer

How to match the PrometheusRule to the AlertmanagerConfig with Prometheus Operator

I have multiple prometheusRules(rule a, rule b), and each rule defined different exp to constraint the alert; then, I have different AlertmanagerConfig(one receiver is slack, then other one's receiver is opsgenie); How can we make a connection…
2
votes
3 answers

Prometheus alerts first counter value

I'm trying to create an alert for errors. There is a metric which counts errors occurred in an application. But when I try to catch its increase it always returns 0. increase(app_error[1h]) Even if I do it with an offset offset 5h, to the point…
passwd
  • 2,883
  • 3
  • 12
  • 22
2
votes
0 answers

Prometheus alert rule for Kafka Consumer lag

I want to track if any application had stopped consuming from kafka topics. For that, I added Kafka Consumer lag alert rule in alert manager which sends alerts on slack channel whenever condition meet. I am doing, group by (consumergroup) and sum…