0

I'm trying to make my DAGs run every Monday at 08:00 AM. For this purpose, I have defined the correspondent schedule interval schedule_interval= '0 8 * * 1'.

However, two problems arise - which are likely due to the same issue:

  • My DAGs never seem to trigger
  • When I force the DAGs to run, they always run to the previous Monday, e.g. if I force the start today (21-10-2021) it will actually trigger a run on the previous week's Monday, 11-09-2021.

Why does this occur and how can I fix it?

Peterson Davis
  • 119
  • 1
  • 7

1 Answers1

2

It's not delayed. Airflow schedule tasks at the END of the interval. You can check this answer for more details. This behavior make sense in the ETL domain as normally you run ETL at the end of a specific time interval. To give example: Today you are parsing yesterday data.

That said - on Airflow >= 2.2.0 a new concept of Timetables has been introduced with the completion of AIP-39 Richer scheduler_interval see release notes. In simple words Airflow decoupled the when to run (Timetable) from the on what interval of time to process (Data Interval) thus resolved the issue you experience from the root. You can read the documentation about it here.

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
  • Thank you! Christ that is strange. So say I want to deploy today (21-10-2021), a DAG which runs every Monday at 08:00 AM and it will already run next Monday (25-10-2021) but with the execution date equal to the day (25-10-2021), what should I do? Specify the start date to 18-10-2021 hence, next Monday the full time will be complete? – Peterson Davis Oct 21 '21 at 19:12
  • 1
    At the same time as far as I can tell, even if it runs on the 25-10-2021 the Execution Date will be 18-10-2021, correct? Hence if I want my code to reflect the actual day and not the lagged execution date, I have to take that into account in my code? – Peterson Davis Oct 21 '21 at 19:13
  • Not any more in Airflow 2.2 if you follow the new approach. But (as Elad mentioned) before that every "dag run" had to correspond to a data interval and it run at the end of it. This was (as for you) a major source of confusion for people who wanted to use Airflow as just a "scheduler" rather than "Data interval" processor - which we responded to and implemented in Airflow 2.2. – Jarek Potiuk Oct 21 '21 at 19:44
  • @PetersonDavis I edited the answer to explain better why Airflow had this behavior to begin with, as explained this behavior is now changed and in Airflow 2.2 this problem doesn't exist any more. – Elad Kalif Oct 22 '21 at 07:23
  • Thanks, Elad and @Jarek Potiuk, makes perfect sense. However, Airflow 2.2.0 seems to be filled with bugs (according to Github's issues) and Timetables aren't very clear yet. An alternative could be to run the dag daily at 08:AM, check if its Sunday, if so, run my DAG (since Monday at 08:AM will execute with Sunday's Execution Date), and adjust the execution date by one day, right? – Peterson Davis Oct 22 '21 at 10:30
  • You can't "adjust" execution_date. i guess you are referring to something like https://stackoverflow.com/a/69295057/14624409 I'd like to note that from what you are describing it feels more like you want to manipulate dates for the data processing that you are doing in your ETL - for that you should check the Airflow macros. You don't need to "change" the execution_date. – Elad Kalif Oct 22 '21 at 16:34