12

Many of the airflow example dags that have schedule_interval=None set a dynamic start date like airflow.utils.dates.days_ago(2) or datetime.utcnow(). However, the docs recommend against a dynamic start date:

We recommend against using dynamic values as start_date, especially datetime.now() as it can be quite confusing. The task is triggered once the period closes, and in theory an @hourly DAG would never get to an hour after now as now() moves along.

Is start date irrelevant for manually triggered dags? What is the best practice here?

rcorre
  • 6,477
  • 3
  • 28
  • 33
  • I'm not sure if I'm clear on what the problem you're encountering is. Can you add more context to your question around what you're trying to achieve, like specifically if a dynamic `start_date` is not working for you? The approach you described seems fine to me as `start_date` isn't too important for a DAG that's only externally triggered. This is a good question because I don't think the current docs make this use case explicitly clear. – Taylor D. Edmiston Jun 07 '18 at 17:24
  • 2
    @TaylorEdmiston no observable problem, just a conflict between docs and examples that make me feel unsure as the user. The tutorial talks about start_date a lot so I wasn't confident that it was really irrelevant. – rcorre Jun 07 '18 at 20:54

3 Answers3

7

I always try to set the start date for manually triggered DAGS as the day I first ran it so that I know when the DAG would have first been run for reference in the future.

Zack
  • 2,296
  • 20
  • 28
1

If you have a schedule_interval=None I believe the start_date is irrelevant as airflow will not attempt to perform any back filling. Just set it to anything even if it's a dynamic one it shouldn't cause any hassle.

Simon D
  • 5,730
  • 2
  • 17
  • 31
  • I would not recommend this. Even for manually triggered DAGs, you may want to manually backfill, e.g. with the `airflow backflow` command. If the dynamic start date is *after* the chosen backfill date, it will block your tasks. – Andy Carlson Apr 25 '19 at 16:05
  • 1
    I don’t fully understand why you would want to manually trigger a backfill for a DAG that has no interval defined but okay. And yes you obviously need to be sure the start date is in the past even if it is dynamic – Simon D Apr 25 '19 at 23:56
  • We do it when we need the `execution_date` to be in the past. – Andy Carlson Apr 26 '19 at 15:15
  • when you manually trigger the dag with trigger_dag you can provide an execution date there. – Simon D Apr 26 '19 at 16:00
1

I ended up just setting start_date to 1970, Jan 1st (absurdly far in the past) so that Airflow never complains that the execution date is before the start date.

Andy Carlson
  • 3,633
  • 24
  • 43