2

We are using airflow 2.00. I am trying to implement a DAG that does two things:

  1. Trigger Reports via API
  2. Download reports from source to destination.

There needs to at least 2-3 hours gap between tasks 1 and 2. From my research I two options

  1. Two DAGs for two tasks. Schedule the 2nd DAG two hour apart from 1st DAG
  2. Delay between two tasks as mentioned here

Is there a preference between the two options. Is there a 3rd option with Airflow 2.0? Please advise.

user2452057
  • 816
  • 2
  • 11
  • 23

2 Answers2

1

The other option would be to have a sensor waiting for the report to be present. You can utilise reschedule mode of sensors to free up workers slots.

generate_report = GenerateOperator(...)
wait_for_report = WaitForReportSensor(mode='reschedule', poke_interval=5 * 60, ...)
donwload_report = DonwloadReportOperator(...)

generate_report >> wait_for_report >> donwload_report
Tomasz Urbaszek
  • 710
  • 5
  • 12
1

A third option would be to use a sensor between two tasks that waits for a report to become ready. An off-the-shelf one if there is one for your source, or a custom one that subclasses the base sensor.

The first two options are different implementations of a fixed waiting time. Two problems with it: 1. What if the report is still not ready after the predefined time? 2. Unnecessary waiting if the report is ready earlier.

SergiyKolesnikov
  • 7,369
  • 2
  • 26
  • 47