6

Seems there there has been previous discussion about this.

How do i stop airflow running a task the first time when i unpause it?

https://groups.google.com/g/cloud-composer-discuss/c/JGtmAd7xcsM?pli=1

When I deploy a dag to run at a specific time (say, once a day at 9AM), Airflow immediately runs the dag at deployment.

dag = DAG(
'My Dag',
default_args=default_args,
schedule_interval='00 09 * * *',
start_date = datetime(2021, 1, 1),
catchup=False # dont run previous and backfill; run only latest
)

That's because with catchup=False, scheduler "creates a DAG run only for the latest interval", as indicated in the doc.

https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html

What I want to achieve is that I don't even want a DAG run for the latest interval to start. I want nothing to happen until the next time clock strikes 9AM.

It seems like out of the box, Airflow does not have any native solution to this problem.

What are some workarounds that people have been using? Perhaps something like check current time is close to next_execution_date?

user3240688
  • 1,188
  • 3
  • 13
  • 34

2 Answers2

3

When you update your dag you can set start_date to the next day. However, it won't work if you pause/unpause dag.

Note it's recommended to be a static value (avoid using datetime.now() or similar dynamic values), so for every deployment, you need to specify a new value like datetime(2021, 10, 15), datetime(2021, 10, 16), ... which might make deployment more difficult.

Mikhail Tokarev
  • 2,843
  • 1
  • 14
  • 35
  • 1
    I think that would work. But I keep on reading that it is not recommended to set `start_date` to be dynamic, like https://marclamberti.com/blog/apache-airflow-best-practices-1/ – user3240688 Oct 13 '21 at 19:42
1
  1. with the dag paused: create dag run http.://.../dagrun/add with Execution Date set to the one needed to skip. This makes task instances in UI accessible
  2. mark success those task instances in the UI
  3. unpause the tag
koli
  • 181
  • 1
  • 5