I am trying to figure out how to create Prometheus alerts for my kubernetes cronjob for the following scenarios using kube-state-metrics.
- If my cronjob fails, send an alert, after a minute if it's still failing or another failed cronjob exists, continue to send an alert every 5 minutes, otherwise resolve.
- If my cronjob runs for over a minute, send an alert.
I've tried count_over_time(kube_job_failed[1m]) > 0
which gives me a failure alert, but never resolves itself.
Any guidance would be greatly appreciated.