7

I'm studying Airflow documentation to understand better its scheduler mechanism. I came across example below.

In the doc it is stated that when DAG is picked by scheduler on 2016-01-02 at 6 AM, a single DAG Run will be created, with an execution_date of 2016-01-01, and the next one will be created just after midnight on the morning of 2016-01-03 with an execution date of 2016-01-02.

Schedule interval is provided as hourly, and execution date refers to start of the period in which DAG is run at the end, so why it isn't just one hour before the 2016-01-02 at 6 AM at which scheduler picks the DAG?

"""
Code that goes along with the Airflow tutorial located at:
https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py
"""
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta


default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2015, 12, 1),
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    'schedule_interval': '@hourly',
}

dag = DAG('tutorial', catchup=False, default_args=default_args)

I created a basic DAG, and its run info is in the picture below. I gave schedule_interval as 50 * * * *. When Scheduler pick the DAG clock was about 10:58, so it already passed 10:50. DAG was triggered immediately, and because it already passed 10:50, its execution date was given 2021-04-25 09:50. So its execution date is also in the day it is triggered, because it is scheduled at minute 50 for each hour.

In airflow @hourly corresponds to 0 * * * *. Its schedule also similar. It is triggered at minute 0 for each hour, but in the doc its execution date is given as 2016-01-01. I think it must have been 2016-01-02 5PM, because its triggered in each hour, and when it is triggered in 6PM, its start date of the interval is 2016-01-02 5PM.

dag run

tkarahan
  • 315
  • 1
  • 2
  • 15

1 Answers1

7

Airflow run DAGs at the end of the interval. Thus when you work with 24 hours interval the run of 2016-01-01 will start on 2016-01-02. This is consistent with data pipelines authoring. Today you are processing yesterday data.

You can read more about it in the following answers:

https://stackoverflow.com/a/65196624/14624409

https://stackoverflow.com/a/66288641/14624409

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
  • 1
    Thanks, but schedule_interval set to hourly. It is what confused me. I know that DAGRuns actually run at the end of the interval, so I thought that if Airflow picks the dag 2016-01-02 6 AM, it must run the DAG and give execution date as start of the interval as 2016-01-02 5 AM, because schedule interval set as hourly. – tkarahan Apr 25 '21 at 10:01
  • @tkarahan it doesn't really matter the logic is the same. I think I might not fully understand your question. The link I provided to another stackoverflow question gives example for hourly dag is it still not clear enough?. Could you please provide screen shot from the UI showing the DAG run details and what exactly doesn't match to your expectations? – Elad Kalif Apr 25 '21 at 10:37
  • 1
    I examined links and what I understood is If you schedule your dag as 0 8 * * * it will be triggered at 08:00 next day, because it is the end of its interval. In my case @hourly corresponds to 0 * * * *, so it is not scheduled once everyday, but once in every hour. Hence its start of the interval must be one hour ago from its trigger time. If it is picked by scheduler at 2016-01-02 6PM, and triggered then its execution date must be 2016-01-02 5PM, because it is scheduled hourly. I also tested this logic, and it seems compatible with screenshot. I wonder whether doc includes false info. – tkarahan Apr 25 '21 at 11:21
  • 1
    @tkarahan Could it be that you are confused by the first run when you provide a `start_date` that is in the past + `catchup=False`? see https://stackoverflow.com/a/67161656/14624409 – Elad Kalif Apr 25 '21 at 11:31
  • I have same question. whether the doc is wrong – Michael Li Dec 07 '22 at 00:33