2

We monitor the KubeJobFailed via prometheus by the expression: kube_job_failed{job="kube-state-metrics",namespace=~".*"} > 0 As for now, we receiving multiple alerts for the same job (the job isn't get 'resolve' between the alerts). We would like to get alert only for the 1st time the job failes. Is that related to prometheus expression or I should edit the YAML? This is how it set in the YAML:

        - match:
          alertname: 'KubeJobFailed'
        repeat_interval: 1h
        receiver: "slack-k8s-dev"
        continue: true

P.S - We do not want to delete jobs.

I've tried to delete the repeat_interval, but it follows the default interval.

  • Possibly related to this closed, unresolved issue; https://github.com/prometheus/alertmanager/issues/1685 – Roar S. Jul 09 '23 at 13:23
  • What is shown for mentioned query in graph tab of prometheus' web console? Is graph shown there continuous? Also is graph for `ALERTS{alertname="KubeJobFailed"}` continuous too? – markalex Jul 09 '23 at 20:30

0 Answers0