3

I created a DAG that will run on a weekly basis. Below is what I tried and it's working as expected.

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator

SCHEDULE_INTERVAL = timedelta(weeks=1, seconds=00, minutes=00, hours=00)
default_args = {
    'depends_on_past': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=2),
    'wait_for_downstream': True,
    'provide_context': True,
    'start_date': datetime(2020, 12, 20, hour=00, minute=00, second=00)
}

with DAG("DAG", default_args=default_args, schedule_interval=SCHEDULE_INTERVAL, catchup=True) as dag:
    t1 = BashOperator(
        task_id='dag_schedule',
        bash_command='echo DAG',
        dag=dag)

As per the schedule, it ran on the 27(i.e. 20 in the script). As there is a change in requirement, Now I updated the start date to 30th(i.e 23 in the script) instead of 27(My idea is to start the schedule from 30 and from there onwards every week). When I change the schedule of the DAG i.e. start date from 27 to 30th. DAG is not picking as per the latest start date, not sure why? When I deleted the DAG(as it is test DAG I deleted it, in prod I can't delete it) and created the new DAG with the same name with the latest start date i.e. 30th, it's running as per the schedule.

data_addict
  • 816
  • 3
  • 15
  • 32
  • "DAG is not picking as per the latest start date" - what does this mean? – mangusta Dec 30 '20 at 11:43
  • start_date of 30/DEC/2020 and interval of 1 week - means the first run will start on 6/Jan/2021. So I'm not sure what is the issue you are referring to? – Elad Kalif Dec 30 '20 at 11:47
  • @Elad if start date is 30/DEC/2020, first dag run will happen on 30/DEC/2020 – mangusta Dec 30 '20 at 11:54
  • 1
    @mangusta dag is triggered at the **END** of the interval. To know when DAG is triggered you must supply both start_date and interval. – Elad Kalif Dec 30 '20 at 11:57
  • @mangusta see https://stackoverflow.com/questions/65196414/problem-with-start-date-and-scheduled-date-in-apache-airflow/65196624#65196624 – Elad Kalif Dec 30 '20 at 11:58
  • @Elad ah you're right, I've been dealing with hourly intervals only, so missed the point that it was weekly. Right, it's supposed to run on Jan 6 – mangusta Dec 30 '20 at 12:14
  • @Elad, You are right, Need to set the start date as 23. Here 27 means 20 and 30 means 27 in the script. I did that in my local while testing. Forgot to update it script here. Updated the script now. For the start date 23 only DAG is not triggering. Following UTC timezone. – data_addict Dec 30 '20 at 12:47

2 Answers2

8

As per the Airflow DOC's

When needing to change your start_date and schedule interval, change the name of the dag (a.k.a. dag_id) - I follow the convention : my_dag_v1, my_dag_v2, my_dag_v3, my_dag_v4, etc...

  • Changing schedule interval always requires changing the dag_id, because previously run TaskInstances will not align with the new schedule interval
  • Changing start_date without changing schedule_interval is safe, but changing to an earlier start_date will not create any new DagRuns for the time between the new start_date and the old one, so tasks will not automatically backfill to the new dates. If you manually create DagRuns, tasks will be scheduled, as long as the DagRun date is after both the task start_date and the dag start_date.

So if we change start date we need to change the DAG name or delete the existing DAG so that it will be recreated with the same name again(metadata related to previous DAG will be deleted from metadata)

Source

Mathias711
  • 6,568
  • 4
  • 41
  • 58
de-learner
  • 226
  • 1
  • 2
  • 6
1

Your DAG as you defined it will be triggered on 6-Jan-2021

Airflow schedule tasks at the END of the interval (See doc reference)

So per your settings:

SCHEDULE_INTERVAL = timedelta(weeks=1, seconds=00, minutes=00, hours=00)

and

'start_date': datetime(2020, 12 , 30, hour=00, minute=00, second=00)

This means the first run will be on 6-Jan-2021 because 30-Dec-2020 + 1 week = 6-Jan-2021 Note that the execution_date of this run will be 2020-12-30

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
  • sorry forgot to mention, for 27 I set the start date as 20 and for 30 I set the start date as 23, I forgot to update the script. Updated now – data_addict Dec 30 '20 at 12:45