2

I am confused by what airflow does if a dagrun fails. The behaviour I want to achieve is:

  1. Regular triggers of the DAG (hourly)
  2. Retries for the task
  3. If a task fails n retries, send an email about the failure
  4. When the next hourly trigger comes round, trigger a new dagrun as if nothing had failed.

These are my dag arguments and task arguments:

task defaults:

'depends_on_past': True,
'start_date': airflow.utils.dates.days_ago(2),
'email': ['email@address.co.uk'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
'wait_for_downstream': False,

dag arguments:

schedule_interval=timedelta(minutes=60),
catchup=False,
max_active_runs=1

I think I am misunderstanding some of these arguments because it appears to me that if a task fails n times (i.e. the dagrun fails), then the next dagrun gets scheduled but just sits in the running state forever and no further dagruns ever succeed (or even get scheduled). for example, here are the dagruns (I didn't know where to find the text based scheduler logs like in this question) where the dags are scheduled to run every 5 minutes instead of every hour:

enter image description here

The execution runs every 5 minutes until the failure, after that the last execution is just in the running state and has been so for the past 30 minutes.

What have I done wrong?

I should add that restarting the scheduler doesn't help and neither does manually setting that running task to failed...

Dan
  • 45,079
  • 17
  • 88
  • 157

2 Answers2

6

You have depends_on_past set to True, which is preventing the next DagRun from being started.

From the docs: depends_on_past (bool) – when set to true, task instances will run sequentially while relying on the previous task’s schedule to succeed. The task instance for the start_date is allowed to run.

This means that your Dag is trying to run, but it is waiting until the corresponding task from the previous DagRun has a success state.

Farvardin
  • 5,336
  • 5
  • 33
  • 54
Viraj Parekh
  • 1,351
  • 6
  • 14
  • So I interpreted that as the tasks *within* a dagrun should depend on the upstream tasks to run, but that they should be independant between different dagruns. So what settings do I need so that tasks depend on upstream tasks but **not** on previous dagruns? – Dan Aug 31 '18 at 09:23
  • 1
    Do I need to set the trigger rule of the most upstream task (i.e. the first task in the dag) to `all-done`? – Dan Aug 31 '18 at 09:43
  • 1
    So I have set `'depends_on_past': False` and set task 1 to have `trigger_rule='all_done'`. Now the new dagruns do continue after a failure, but now I'm no longer receiving emails for the failed tasks! – Dan Aug 31 '18 at 12:31
  • emails came through, but some were hours late. Otherwise seems to be working. Thanks – Dan Aug 31 '18 at 13:13
1

This question has given me a lot of headache so I want to post a complete solution.

In my case, the execution of the next DAG was not starting when the previous execution failed even though I had the option depends_on_past = False. This was because the wait_for_downstream option was True and this combination is incompatible. According to the documentation:

wait_for_downstream (bool) - when set to true, an instance of task X will wait >for tasks immediately downstream of the previous instance of task X to finish >successfully before it runs. This is useful if the different instances of a task >X alter the same asset, and this asset is used by tasks downstream of task X. >Note that depends_on_past is forced to True wherever wait_for_downstream is used.

Finally note that it is important that the max_active_runs = 1 option is activated because in another case the same task can start running simultaneously on several subsequent dags run.

from datetime import datetime, timedelta
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator

args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'wait_for_downstream': False,
    'start_date': datetime(2019, 7, 20),
}

dag = DAG(
    dag_id='test_v8',
    default_args=args,
    schedule_interval='* * * * *',
    catchup=False,
    max_active_runs=1

)

from time import sleep


def sleep_1():
    sleep(1)


def sleep_2():
    sleep(2)


sleep_2 = PythonOperator(
    task_id='sleep_2',
    python_callable=sleep_2,
    dag=dag,
)

sleep_1 = PythonOperator(
    task_id='sleep_1',
    python_callable=sleep_1,
    dag=dag,
)

sleep_1 >> sleep_2

Finally that did the work!

enter image description here

alvaro nortes
  • 570
  • 4
  • 10
  • What if you set `wait_for_downstream` to false for `sleep1` but to true for `sleep2`? Because otherwise won't `sleep2` run simultaneously with `sleep1`? – Dan Aug 07 '19 at 13:01
  • Hi Dan, I think that wait_for_downstream is a DAG level option then couldn't be different for every task in the same DAG. Sleep2 and Sleep1 can't run simultaneously because the precedence between them is set with the sentence sleep_1 >> sleep_2 – alvaro nortes Aug 07 '19 at 13:09
  • haven't used airflow in a while so could be wrong: dag level args are kwargs in `DAG` init, but `default_args` set defaults on each task which you can then override on the tasks. `sleep1 >> sleep2` is the same as setting which tasks are down stream. But I think in this case I was mixing up upstream and downstream. I *think* in your case `sleep2` is down stream so by setting wait for downstream to false on `sleep1`, you are telling sleep1 is doesn't have to wait for the previous dagrun to finish before starting another. You could test by making the `sleep` longer than the `schedule_interval` – Dan Aug 07 '19 at 13:17