2

If the cron expression for my Airflow DAG is: 30 0 * * *, then why do my DAG runs show an execution date of the previous day?

I am using Airflow 1.10.10. In the DAG, I have PostgresOperators running SQL on a database. The SQL contains filters on a date column, and I'm filtering using the {{ ds_nodash }} macro. But, the ds_nodash macro resolves to yesterday!

Here's the webserver view of the dag run dates:

Here you can see the start date and execution date

  • (I'm assuming that the date in the Run Id (scheduled__2021-02-21T00:30:00+00:00), is the DAG run's execution date based on the behavior I describe above.)

My expectation is that the execution date date should be the same or very close to the start date based on the cron interval expression. Is my assumption incorrect? If so, why?

cdabel
  • 461
  • 9
  • 21

1 Answers1

3

As you described the run_id is created using the execution_date. Your SQL query probably needs to be:

WHERE date_col BETWEEN {{ ds_nodash }} AND {{ next_ds_nodash }}

The reason for this is because in ETLs you specify the window you want to query on but this window is accessible only at the end of the interval. Thus resulting that the run of 2021-02-21 can actually be executed only on 2021-02-22.

Possibly this answer may provide more information about the scheduling.

Since this is quite confusing for many users there is a discussion in the dev mailing list to address this issue. So this will change in future Airflow versions.

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49