17

I am trying to execute a task after 5 minutes from the parent task inside a DAG.

DAG : Task 1 ----> Wait for 5 minutes ----> Task 2

How can I achieve this in Apache Airflow? Thanks in advance.

SergiyKolesnikov
  • 7,369
  • 2
  • 26
  • 47
Spandan Singh
  • 674
  • 1
  • 5
  • 15

2 Answers2

21

The said behaviour can be achieved by introducing a task that forces a delay of specified duration between your Task 1 and Task 2


This can be achieved using PythonOperator

import time
from airflow.operators.python import PythonOperator

delay_python_task: PythonOperator = PythonOperator(task_id="delay_python_task",
                                                   dag=my_dag,
                                                   python_callable=lambda: time.sleep(300))

task_1 >> delay_python_task >> task_2

Or using BashOperator as well

from airflow.operators.bash import BashOperator
delay_bash_task: BashOperator = BashOperator(task_id="delay_bash_task",
                                             dag=my_dag,
                                             bash_command="sleep 5m")
task_1 >> delay_bash_task >> task_2

Note: The given code-snippets are NOT tested


References


UPDATE-1

Here are some other ways of introducing delay

  • UPDATE: do NOT use this as pointed out by @Vit.ai. Original point: on_success_callback / on_failure_callback: Depending of whether Task 2 is supposed to run upon success or failure of Task 1, you can pass lambda: time.sleep(300) in either of these params of Task 1.
  • pre_execute() / post_execute(): Invoking time.sleep(300) in Task 1's post_execute() or Task 2's pre_execute() would also have the same effect. Of course this would involve modifying code for your tasks (1 or 2) so better avoid it

Personally I would prefer the extra task approach because it makes things more explicit and doesn't falsely exaggerate the runtime of your Task 1 or Task 2

Stephen
  • 8,508
  • 12
  • 56
  • 96
y2k-shubham
  • 10,183
  • 11
  • 55
  • 131
  • 1
    I tried this, but it keeps the DAG in the running state, So if my wait time is 2 days, there will be a lot of concurrent DAGs that keeps on running, causing the new DAGS to be in the queued state for 2 days. Is there any workaround such that the DAG while sleeping leave the thread. – Spandan Singh Mar 06 '19 at 09:53
  • 2
    **@Spandan Singh** i can think of 2 possible workarounds **[1]** have a *continuously running DAG* that triggers other dags at right time using `TriggerDagRunOperator` **[2]** keep triggering your dag frequently and if right time hasn't come yet, then skip execution using either `AirflowSkipException` or `ShortCircuitOperator` – y2k-shubham Mar 06 '19 at 11:46
  • The links are outdated now, one of the corresponding links is - https://github.com/apache/airflow/blob/2c99ec624bd66e9fa38e9f0087d46ef4d7f05aec/airflow/models/baseoperator.py#L594 – akki Jul 03 '19 at 12:22
  • 1
    @y2k-shubham But when we do time.sleep it halts the current celery worker from picking other task, this way if we have 4 celery worker and 4 task with sleep function and 1 task without sleep, then the task without sleep has no worker left to execute. Correct me if its wrong ? In my opinion better way to wait will be in your current python task, you can check the retry count and if its zero then raise AirflowFailed exception and apply retry delay of 5 minutes, this way the task will be in sleep mode for 5 minutes without halting the current worker. – Deepak Tripathi Jul 12 '22 at 11:30
  • 2
    **@Deepak Tripathi** I acknowledge this is a gaping hole in the proposed solution. I haven't been using Airflow for a while now, but do checkout [this thread](https://stackoverflow.com/q/59757151/3679900) too – y2k-shubham Jul 12 '22 at 12:25
3

@y2k-shubham gave the best answer to date, however, I want to warn not to use the callback solution. as it first marks the task as success and then executes the callback. which means task2 will not see any delay. if you don't want to use a separate task, you can use something like this:

< ... >
task1 = DummyOperator(task_id='task1', dag=dag)
task1.post_execute = lambda **x: time.sleep(300)
task2 = DummyOperator(task_id'task2', dag=dag)

task1 >> task2
Vit.ai
  • 51
  • 4