I am trying to create dependency between multiple dags.
Lets say Dag_A
, Dab_B
and and running every day at 14:15 and 14:30 respectively.
now i want to run Dag_C
which runs at 14:30 having 2 sensors ( ExternalTaskSensors) each for above dags. I am also using execution_date_fn
parameter which provides 3 execution date each for above dags. So basically sensor checks for 14:15 and 14:30 for each dag. But still sensor keeps on waiting and doesn't succeed. It going for up_for_schedule
Am i doing anything wrong? Please suggest how to deal with such cases.
I am using airflow version 2
Below is the code for
DAG_A:
with DAG(
dag_id="dag_a",
default_args=DEFAULT_ARGS,
max_active_runs=1,
schedule_interval="15 2 * * *",
catchup=True
) as dag:
dummy_task = DummyOperator(task_id="Task_A")
DAG_B:
with DAG(
dag_id="dag_b",
default_args=DEFAULT_ARGS,
max_active_runs=1,
schedule_interval="30 2 * * *",
catchup=True
) as dag:
dummy_task = DummyOperator(task_id="Task_B")
DAG_C:
with DAG(
dag_id="dag_c",
default_args=DEFAULT_ARGS,
max_active_runs=1,
schedule_interval="30 2 * * *",
catchup=True
) as dag:
wait_task_a = ExternalTaskSensor(
task_id=f"wait_for_task_a",
external_dag_id="dag_a",
execution_date_fn=lambda dt: [dt + timedelta(minutes=-i) for i in range(0, 30, 15)],
timeout=60 * 60 * 3, # 3 hours
poke_interval=60, # 5 minutes
mode="reschedule"
)
wait_task_b = ExternalTaskSensor(
task_id=f"wait_for_task_b",
external_dag_id="dag_b",
execution_date_fn=lambda dt: [dt + timedelta(minutes=-i) for i in range(0, 30, 15)],
timeout=60 * 60 * 3, # 3 hours
poke_interval=60, # 5 minutes
mode="reschedule"
)
dummy_task = DummyOperator(task_id="Task_C")
wait_task_a >> dummy_task
wait_task_b >> dummy_task
Sensor logs : It keeps on poking although tasks are present
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1043} INFO - Dependencies all met for <TaskInstance: dag_c.wait_for_task_b scheduled__2022-05-19T02:30:00+00:00 [queued]>
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1043} INFO - Dependencies all met for <TaskInstance: dag_c.wait_for_task_b scheduled__2022-05-19T02:30:00+00:00 [queued]>
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1249} INFO -
--------------------------------------------------------------------------------
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1250} INFO - Starting attempt 1 of 2
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1251} INFO -
--------------------------------------------------------------------------------
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1270} INFO - Executing <Task(ExternalTaskSensor): wait_for_task_b> on 2022-05-19 02:30:00+00:00
[2022-05-23, 16:25:20 UTC] {standard_task_runner.py:52} INFO - Started process 17603 to run task
[2022-05-23, 16:25:20 UTC] {standard_task_runner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'dag_c', 'wait_for_task_b', 'scheduled__2022-05-19T02:30:00+00:00', '--job-id', '4', '--raw', '--subdir', 'DAGS_FOLDER/sample/dagc.py', '--cfg-path', '/var/folders/q1/dztb0bzn0fn8mvfm7_q9ms0m0000gn/T/tmpb27mns7u', '--error-file', '/var/folders/q1/dztb0bzn0fn8mvfm7_q9ms0m0000gn/T/tmpc6y4_6cx']
[2022-05-23, 16:25:20 UTC] {standard_task_runner.py:80} INFO - Job 4: Subtask wait_for_task_b
[2022-05-23, 16:25:25 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: dag_c.wait_for_task_b scheduled__2022-05-19T02:30:00+00:00 [running]> on host yahoo-MacBook-Pro.local
[2022-05-23, 16:25:30 UTC] {taskinstance.py:1448} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=dag_c
AIRFLOW_CTX_TASK_ID=wait_for_task_b
AIRFLOW_CTX_EXECUTION_DATE=2022-05-19T02:30:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-05-19T02:30:00+00:00
[2022-05-23, 16:25:30 UTC] {external_task.py:175} INFO - Poking for tasks None in dag dag_b on 2022-05-19T02:30:00+00:00,2022-05-19T02:15:00+00:00 ...
[2022-05-23, 16:25:30 UTC] {taskinstance.py:1726} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2022-05-23, 16:25:30 UTC] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-05-23, 16:25:30 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check