Prometheus is unable to fire alerts to Alertmanager(HTTP status 503)

Question

I'm experiencing a widespread issue with Prometheus to Alertmanager communication. Whenever an Alertmanager pod restarts, server logs 503 error to that individual pod. Other AM pods receive the alerts until they get restarted.

Prometheus Version: 2.42.0

Alertmanager Version: 0.25.0

Istio Version: v1.17

Issue Description

I'm using Istio mesh to connect Prometheus to Alertmanager. Whenever an Alertmanager pod gets restarted, I get the following error. If I restart the Prometheus server error goes away, and able to establish a new connection to Alertmanager. Looks like Prometheus is caching these IPs, not getting fully closed.

ts=2023-03-07T21:34:40.312Z caller=scrape.go:1351 level=debug component="scrape manager" scrape_pool=alertmanager target=http://am-0.monitoring.svc.cluster.local:9093/metrics msg="Scrape failed" err="server returned HTTP status 503 Service Unavailable"

alerting configuration:

    alerting:
      alert_relabel_configs:
      - action: labeldrop
        regex: replica
        replacement: $1
        separator: ;
      alertmanagers:
        - static_configs:
          - targets:
            - am-0.monitoring.svc.cluster.local:9093
            - am-1.monitoring.svc.cluster.local:9093
            - am-2.monitoring.svc.cluster.local:9093

Can you please help with this?

Prometheus is unable to fire alerts to Alertmanager(HTTP status 503)

0 Answers0