1

I have the following dag config:

with DAG(
    dag_id='dag_example',
    catchup=False,
    start_date=datetime.datetime(2022, 5, 26),
    schedule_interval='0 6,7,9,11,15,19,23 * * *',
    max_active_runs=1,
    default_args=default_args
)

I would like to know why my dag that is scheduled to run at 7 AM is running at 9 AM (next scheduled date...). I'm using airflow 2.1.2. When I was using airflow v1 the dag runs correclty.

enter image description here

OdiumPura
  • 444
  • 5
  • 25

1 Answers1

2

This is how Airflow works. DAGs are scheduled at the end of the interval. So in your case run_id of 2022-05-27 10:00 will start running on 2022-05-27 12:00 because the interval you set is of 2 hours and Airflow schedule at the end of the interval.

Note: This is consistent with batch processing practices.

If you run a daily job then today you are processing yesterday data.

If you run hourly job then at 10:00 you are processing the interval between 09:00 to 10:00, in other words the run_id of 09:00 will actually run at the end of the hourly interval which is 10:00

You can read Problem with start date and scheduled date in Apache Airflow for more information

Should you want reference specific interval from your DAG this is just a question of what macro to use. See Templates reference

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
  • Thanks for your answer! I changed my schedule according to your answer and it works! Basically I've added one more hour to schedule `(0 4,6,7,9,11,15,19,23 * * *)` – OdiumPura May 28 '22 at 22:55