1

I've set up a dag with the following parameters

local_tz = pendulum.timezone('US/Eastern')  

default_args = {
    'retries': 3,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG(
    dag_id='some_dag',
    start_date=datetime(2021, 1, 8, tzinfo=local_tz),
    schedule_interval='0 16 8 * *',
    default_args=default_args,
    catchup=True
)

I am expecting the most recent task run to be on May 8th, however, I only see February 8th, March 8th, and April 8th. I can't seem to figure out why Airflow stops in April.

enter image description here

It is currently May 25th so shouldn't the May 8th dag run have backfilled along with the other months? To be clear, I have just deployed this dag today, so all of the executed dag runs including the missing May 8th are backfills.

  • May will run in June – drum May 26 '21 at 01:46
  • I don't understand why... according to the docs the first execution is start_date + schedule_interval, and then at every interval. May 8th has already passed, shouldn't it have run in May? – Jonathan Duran May 26 '21 at 01:59
  • 1
    As said before, this is the expected behavior since the interval is not done yet, the run for May, will be run in June. Try [this answer](https://stackoverflow.com/a/66566864/10569220) as an example. My suggestion is to play around with `start_date` and compare the information shown at Dag Runs menu in the UI. – NicoE May 27 '21 at 13:30
  • took me a bit but yes it makes sense. – Jonathan Duran May 27 '21 at 23:20

1 Answers1

1

This is expected. As you mentioned Airflow schedule tasks at the end of interval. According to your setup the scheduling will look like:

The 1st run will start on 2021-02-08 this run execution_date will be 2021-01-08

The 2nd run will start on 2021-03-08 this run execution_date will be 2021-02-08

The 3th run will start on 2021-04-08 this run execution_date will be 2021-03-08

The 4th run will start on 2021-05-08 this run execution_date will be 2021-04-08

The 5th run will start on 2021-06-08 this run execution_date will be 2021-05-08

Since you actually set the DAG to start on 2021-05-26 Airflow executed at that moment 1st-4th runs because the interval has ended for these runs. The 5th run did not start yet because the interval has not endded yet it will end on 2021-06-08.

You can read more extensive explanation about why Airflow behaves like that in this answer.

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49