Datadog monitor alert failing for eks cronjob

Asked Jun 14 '23 at 14:02

Active Jun 14 '23 at 14:02

Viewed 123 times

I am setting cronjob failure alert using Datadog and using below query

max(last_5m):max:kubernetes_state.job.completion.failed{kube_cronjob:cronjobnamexx} by {kube_cluster_name,kube_namespace,kube_cronjob} >= 1

Above query sends alert for the first time and then the alert never clears even after the job runs successfully multiple times. I resolved it manually after successful job completion but still the alert gets triggered again for above query even though job never ran again.

What I observe in evaluation graph is value never changes after first time changing from 0 to 1 and then it stays forever at 1 independent of cronjob results. Can someone give some insights on what am I missing here ?

I searched in the various places in internet and query seems to be perfectly fine but I cannot figure out what is missing here.

asked Jun 14 '23 at 14:02

Samu

Please provide enough code so others can better understand or reproduce the problem. – Community Jun 15 '23 at 09:51

Datadog monitor alert failing for eks cronjob

0 Answers0