5

I am looking for support to debug this Airflow KubernetesPodOperator Issue. We randomly get this error when the Airflow task executes. The job is almost finished and at the end of the job execution, the pods not found excception throw, (, In Reality, Airflow Task which is a python job already finished it's working) but due to this exception Airflow marked this job as failed).


ERROR - (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'd4df122xx-bxcb-42f2-8c9e-768e9bbb00x9', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'xxxx-xxx-xxx-xxxxxxxx', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'xxxx-xxx-xxx-xxxxxxxx', 'Date': 'Sat, 17 Jul 2021 02:10:07 GMT', 'Content-Length': '258'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"xxxx.6cb9f2cc66d0455c882cb5bae007ae84\" not found","reason":"NotFound","details":{"name":"xxx.6cb9f2cc66d0455c882cb5bae007ae84","kind":"pods"},"code":404}

We do save detailed logs in Elasticsearch Index and there are no logs at that special time to investigate why these pods are not found by Airflow for this running job.

Can someone from Airflow Kubernetes Expert guide in the right direction on how to fix and investigate this issue?

Madiha Khalid
  • 414
  • 3
  • 15
  • Some of the prechecks you can perform 1. Check if you are connecting to the right Kubernetes cluster? 2. Kubeconfg file 3. Airflow operator referring to correct namespace 4. Check if pod present and state – deepak Jul 19 '21 at 05:27
  • This is a bug, which I am also encountering. The worker pod is exited, as you stated, and logs should be read from a remote place, e.g. S3 if you configured it. – Johann8 Aug 03 '21 at 09:28
  • 1
    Hi, were you able to solve this? Is this trully a bug? I'm getting pretty much the same error, but no clue on where to start looking to fix this... Also using Airflow with KubernetesPodOperator. – Alain Feb 21 '22 at 22:12
  • I get the same, but this bug should have been fixed in November of 2021: https://github.com/apache/airflow/issues/15456#issuecomment-981635086 – Danielson Jun 15 '22 at 05:55

0 Answers0