2

I am trying to figure out how to create Prometheus alerts for my kubernetes cronjob for the following scenarios using kube-state-metrics.

  1. If my cronjob fails, send an alert, after a minute if it's still failing or another failed cronjob exists, continue to send an alert every 5 minutes, otherwise resolve.
  2. If my cronjob runs for over a minute, send an alert.

I've tried count_over_time(kube_job_failed[1m]) > 0 which gives me a failure alert, but never resolves itself.

Any guidance would be greatly appreciated.

Jonas
  • 121,568
  • 97
  • 310
  • 388
Matt
  • 51
  • 3

0 Answers0