1

I have a newly created daily dag and I have set it up yesterday (Jan. 25th), once it is loaded by airflow I can see it is run once (scheduled_2021-0124T00:00:00+00:00), and then I manually triggered it once just to see if it works and it did (manual_2021-01-25).

Now time is 08:24 UTC Jan 26th. But I did not see any run for 01-25. I have used airflow dags next-execution and found out airflow is planning to execute the dag for 01-26 directly, possibly on 01-27 00:00 UTC. So it will skip 01-25 entirely.

I am wondering why this behaviour? Is there any reason behind this?

enter image description here

Bob Fang
  • 6,963
  • 10
  • 39
  • 72
  • You might want to read https://stackoverflow.com/questions/65196414/problem-with-start-date-and-scheduled-date-in-apache-airflow/65196624#65196624 – Elad Kalif Jan 27 '21 at 18:31

3 Answers3

5

This is THE most difficult concept to grasp in Airflow. After you get this the rest of the system is fairly straightforward. But this one design spec is brutal, I have seen it being seasoned engineers to their knees, sobbing in fits of rage.

As the other poster mentioned in the Airflow docs, Airflow runs your job at the end of the period. This is easiest for me to visualize for a DAG that has a daily schedule. The DAG run date for 01/01/2021, with a start time of 00:01 AM, will not execute until 01/02/2021 00:01 AM.

The confusing part of this is WHY!? When you stop to think about why Airflow was written it begins to make sense. This execution pattern ensures that the data for the run date 01/01/2021 is complete and ready when your orchestration pipeline runs to act on this data. Think about it as a business process. If you are a business analyst and come into work on 01/02/2021 you will be looking at data from the day before, not data from today. The data from today has not yet been collected.

The same pattern is true for weekly or monthly intervals as well. The data for that week or month is not going to be ready to act on until the end of the period.

This also makes more sense when you start using the macros and jinja templating.

Hopefully this is now clear as Mud.

trejas
  • 991
  • 7
  • 17
  • btw, there is a Github issue to have an option to change this feature so it runs at the beginning of the period (01/01/2021). Hopefully, they'll push that out this year. – Gabe Jan 27 '21 at 17:12
  • @Gabe interesting. Seems like that would make this even more confusing. The way this currently works with jinja inserts and macros is great, just not completely intuitive when you first encounter it. – trejas Jan 27 '21 at 17:16
  • Looks I was mistaken here. Misunderstood the problem OP was having. Disregard the answer. – trejas Feb 26 '21 at 23:00
5

This is actually a bug in Airflow 2.0.0 release which was fixed in 2.0.1: https://github.com/apache/airflow/issues/13434

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
Bob Fang
  • 6,963
  • 10
  • 39
  • 72
2

This is a feature of Airflow that confused me too, in the beginning. From the Airflow docs:

If you run a DAG on a schedule_interval of one day, the run with execution_date 2019-11-21 triggers soon after 2019-11-21T23:59.

Let’s Repeat That, the scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.

SergiyKolesnikov
  • 7,369
  • 2
  • 26
  • 47