I am looking for support to debug this Airflow KubernetesPodOperator Issue. We randomly get this error when the Airflow task executes. The job is almost finished and at the end of the job execution, the pods not found excception
throw, (, In Reality, Airflow Task which is a python job already finished it's working) but due to this exception Airflow marked this job as failed
).
ERROR - (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'd4df122xx-bxcb-42f2-8c9e-768e9bbb00x9', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'xxxx-xxx-xxx-xxxxxxxx', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'xxxx-xxx-xxx-xxxxxxxx', 'Date': 'Sat, 17 Jul 2021 02:10:07 GMT', 'Content-Length': '258'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"xxxx.6cb9f2cc66d0455c882cb5bae007ae84\" not found","reason":"NotFound","details":{"name":"xxx.6cb9f2cc66d0455c882cb5bae007ae84","kind":"pods"},"code":404}
We do save detailed logs in Elasticsearch Index and there are no logs at that special time to investigate why these pods are not found by Airflow for this running job.
Can someone from Airflow Kubernetes Expert guide in the right direction on how to fix and investigate this issue?