We are using Airflow as orchestrator where it schedule workflow every hour. DataprocSubmitJobOperator is configured to schedule dataproc jobs (it uses spark). Spark sync data from source to target (runs for 50 min and then completes to avoid next schedule overlap).
Intermittent Airflow task fails due to zombie Exception. Logs show assertion failure due to pthread_mutex_lock(mu). Airflow Task exits. Underlying dataproc Job keeps running without issue.
Please suggest what can be potential issue/fix?
[2021-12-22 23:01:17,150] {dataproc.py:1890} INFO - Submitting job
[2021-12-22 23:01:17,804] {dataproc.py:1902} INFO - Job 27a2c88d-1308-4407-b965-aa490e2217fb submitted successfully.
[2021-12-22 23:01:17,805] {dataproc.py:1905} INFO - Waiting for job 27a2c88d-1308-4407-b965-aa490e2217fb to complete
E1222 23:45:58.299007027 1267 sync_posix.cc:67] assertion failed: pthread_mutex_lock(mu) == 0
[2021-12-22 23:46:00,943] {local_task_job.py:102} INFO - Task exited with return code Negsignal.SIGABRT
Config
raw_data_sync = DataprocSubmitJobOperator(
task_id="raw_data_sync",
job=RAW_DATA_GENERATION,
location='us-central1',
project_id='1f780b38bd7b0384e53292de20',
execution_timeout=timedelta(seconds=3420),
dag=dag
)