I am setting cronjob failure alert using Datadog and using below query
max(last_5m):max:kubernetes_state.job.completion.failed{kube_cronjob:cronjobnamexx} by {kube_cluster_name,kube_namespace,kube_cronjob} >= 1
Above query sends alert for the first time and then the alert never clears even after the job runs successfully multiple times. I resolved it manually after successful job completion but still the alert gets triggered again for above query even though job never ran again.
What I observe in evaluation graph is value never changes after first time changing from 0 to 1 and then it stays forever at 1 independent of cronjob results. Can someone give some insights on what am I missing here ?
I searched in the various places in internet and query seems to be perfectly fine but I cannot figure out what is missing here.