0

I am trying to create dependency between multiple dags.

Lets say Dag_A, Dab_B and and running every day at 14:15 and 14:30 respectively.

now i want to run Dag_C which runs at 14:30 having 2 sensors ( ExternalTaskSensors) each for above dags. I am also using execution_date_fn parameter which provides 3 execution date each for above dags. So basically sensor checks for 14:15 and 14:30 for each dag. But still sensor keeps on waiting and doesn't succeed. It going for up_for_schedule

Am i doing anything wrong? Please suggest how to deal with such cases.

I am using airflow version 2

Below is the code for

DAG_A:

with DAG(
        dag_id="dag_a",
        default_args=DEFAULT_ARGS,
        max_active_runs=1,
        schedule_interval="15 2 * * *",
        catchup=True
) as dag:
    dummy_task = DummyOperator(task_id="Task_A")

DAG_B:

with DAG(
        dag_id="dag_b",
        default_args=DEFAULT_ARGS,
        max_active_runs=1,
        schedule_interval="30 2 * * *",
        catchup=True
) as dag:
    dummy_task = DummyOperator(task_id="Task_B")

DAG_C:

with DAG(
        dag_id="dag_c",
        default_args=DEFAULT_ARGS,
        max_active_runs=1,
        schedule_interval="30 2 * * *",
        catchup=True
) as dag:
    wait_task_a = ExternalTaskSensor(
        task_id=f"wait_for_task_a",
        external_dag_id="dag_a",
        execution_date_fn=lambda dt: [dt + timedelta(minutes=-i) for i in range(0, 30, 15)],
        timeout=60 * 60 * 3,  # 3 hours
        poke_interval=60,  # 5 minutes
        mode="reschedule"
    )
    wait_task_b = ExternalTaskSensor(
        task_id=f"wait_for_task_b",
        external_dag_id="dag_b",
        execution_date_fn=lambda dt: [dt + timedelta(minutes=-i) for i in range(0, 30, 15)],
        timeout=60 * 60 * 3,  # 3 hours
        poke_interval=60,  # 5 minutes
        mode="reschedule"
    )
    dummy_task = DummyOperator(task_id="Task_C")
    wait_task_a >> dummy_task
    wait_task_b >> dummy_task

Sensor logs : It keeps on poking although tasks are present

[2022-05-23, 16:25:20 UTC] {taskinstance.py:1043} INFO - Dependencies all met for <TaskInstance: dag_c.wait_for_task_b scheduled__2022-05-19T02:30:00+00:00 [queued]>
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1043} INFO - Dependencies all met for <TaskInstance: dag_c.wait_for_task_b scheduled__2022-05-19T02:30:00+00:00 [queued]>
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1249} INFO - 
--------------------------------------------------------------------------------
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1250} INFO - Starting attempt 1 of 2
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1251} INFO - 
--------------------------------------------------------------------------------
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1270} INFO - Executing <Task(ExternalTaskSensor): wait_for_task_b> on 2022-05-19 02:30:00+00:00
[2022-05-23, 16:25:20 UTC] {standard_task_runner.py:52} INFO - Started process 17603 to run task
[2022-05-23, 16:25:20 UTC] {standard_task_runner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'dag_c', 'wait_for_task_b', 'scheduled__2022-05-19T02:30:00+00:00', '--job-id', '4', '--raw', '--subdir', 'DAGS_FOLDER/sample/dagc.py', '--cfg-path', '/var/folders/q1/dztb0bzn0fn8mvfm7_q9ms0m0000gn/T/tmpb27mns7u', '--error-file', '/var/folders/q1/dztb0bzn0fn8mvfm7_q9ms0m0000gn/T/tmpc6y4_6cx']
[2022-05-23, 16:25:20 UTC] {standard_task_runner.py:80} INFO - Job 4: Subtask wait_for_task_b
[2022-05-23, 16:25:25 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: dag_c.wait_for_task_b scheduled__2022-05-19T02:30:00+00:00 [running]> on host yahoo-MacBook-Pro.local
[2022-05-23, 16:25:30 UTC] {taskinstance.py:1448} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=dag_c
AIRFLOW_CTX_TASK_ID=wait_for_task_b
AIRFLOW_CTX_EXECUTION_DATE=2022-05-19T02:30:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-05-19T02:30:00+00:00
[2022-05-23, 16:25:30 UTC] {external_task.py:175} INFO - Poking for tasks None in dag dag_b on 2022-05-19T02:30:00+00:00,2022-05-19T02:15:00+00:00 ... 
[2022-05-23, 16:25:30 UTC] {taskinstance.py:1726} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2022-05-23, 16:25:30 UTC] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-05-23, 16:25:30 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check

yahoo
  • 183
  • 3
  • 22
  • In the sensor logs, you can see which tasks and corresponding dates were checked. That might show an incorrect task/execution date? Also, "It's not working" is unfortunately not very helpful. Please share a representative code sample. – Bas Harenslak May 23 '22 at 10:27
  • I have added more information – yahoo May 23 '22 at 11:01

0 Answers0