6

I am running an hourly process that picks up data from one location ("origin") and moves it to another ("destination"). for the most part, the data arrives to my origin at specific time and everything works fine, but there can be delays and when that happens, the task in airflow fails and need to be manually re-run. One way to solve this is to give more time for the data to arrive, but I prefer to do that only if there is in fact a delay. Also, I wouldn't want to have a sensor that is waiting on the data for a long time, as it can cause deadlocks (preferably not to have an hourly task running for longer than 1 hour). Does airflow allow any re scheduling of a task for a given condition (failed, or no data exists), so that we don't have to manually re-run our failed tasks?

Thanks!

Nir Ben Yaacov
  • 1,182
  • 2
  • 17
  • 33
  • I'm struggling with the same problem. I use an SQLSensor, but as you mention, sometimes there are delays. I don't know if you have found a solution. – Luis Miguel May 29 '23 at 21:11

1 Answers1

7

Check out the following parameters for the BaseOperator (This is the parent class for all operators):

  • retry_delay (timedelta) – delay between retries
  • retry_exponential_backoff (bool) – allow progressive longer waits between retries by using exponential backoff algorithm on retry delay (delay will be converted into seconds)
  • max_retry_delay (timedelta) – maximum delay interval between retries

Getting a good mix on these three should give you what you want.

https://incubator-airflow.readthedocs.io/en/latest/code.html

trejas
  • 991
  • 7
  • 17
  • This can be set in your DAG default args. Or these can be set for each individual task/operator. – trejas Apr 21 '19 at 19:08