0

I am using Airflow 2.1.4, the problem is my task randomly received SIGTERM. But when I check the log from email alert from first image. enter image description here

log only showing something like in the second image, and my task failed, and I am sure I already set retries to 2 (in the default args of DAG which is each task will have value of retries is 2), but why the task immediately failed in the first attempt, and why the log looks like this. enter image description here

And it is just randomly happened, sometimes work fine, sometimes not.

Environment : Airflow 2.1.4 (upgrade from 2.1.2) Using Rabbitmq, and mysql Worker = 3

1 Answers1

0

Apparently, one of the causes is my new worker, because we didnt stop the scheduler when we add new worker, it causes this problem. Here my steps how i can identify problem : Using Flowers to check when this problem arise, which worker that handle it before and after, and from this i knew tasks that failed and get this log, always from the "new worker". So I reinstall the new worker. And always make sure to make airflow on maintenance mode. I stop the worker service, scheduler service and webserver service. After that my Airflow work normal, but because my DAGS instance always left behind the schedule, and the server still normal (I assume we still have enough resource), I change parameters like worker_concurrency (right now we using worker_autoscale), parallelism, and parsing_processes.

source to scaling up airflow : https://www.astronomer.io/guides/airflow-scaling-workers

  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/30249364) – user12256545 Nov 05 '21 at 14:27