tl;dr, Problem framing:
Assuming I have a sensor poking with timeout = 24*60*60
. Since the connection does time out occasionally, retries
must be allowed. If the sensor now retries, the timeout
variable is being applied to every new try with the initial 24*60*60
, and, therefore, the task does not time out after 24 hrs as it was intended.
Question:
Is there a way to restrict the max-time of a task - like a meta-timeout?
Airflow-Version: 1.10.14
Walk-thorough-thru:
BASE_DIR = "/some/base/dir/"
FILE_NAME = "some_file.xlsx"
VOL_BASE_DIR = "/some/mounted/vol/"
default_args = {
"owner": "airflow",
"depends_on_past": False,
"start_date": "2020-11-01",
"retries": 2,
"retry_delay": timedelta(minutes=5),
}
dag = DAG(
"supplier",
default_args=default_args,
description="ETL Process for Supplier",
schedule_interval=None,
catchup=False,
max_active_runs=1,
)
file_sensor = FileSensor(
task_id="file_sensor",
poke_interval=60*60,
timeout=24*60*60,
retries=4,
mode="reschedule",
filepath=os.path.join(BASE_DIR,FILE_NAME)
fs_conn_id='conn_filesensor',
dag=dag,
)
clean_docker_vol = InitCleanProcFolderOperator(
task_id="clean_docker_vol",
folder=VOL_BASE_DIR,
dag=dag,
)
....
This DAG should run and check if a file exists. If it exists, it should continue. Occasionally, it can happen that the sensor-task is being rescheduled due to the file being provided too late (or, say, connection errors). The MAX-overall 'run-time' of the dag should NOT exceed 24 hrs. Due to the retries, however, the time does exceed the 24 hrs timeout, if the tasks fails and is being rescheduled.
Example:
- runs for 4 hrs (18 hrs should be left)
- fails
- up_for_retry
- starts again with 24 hrs timeout, not 18 hrs.
As I need to allow retries, there is not the option of just setting retries to 0 to avoid this behavior. I was rather looking for a meta-timeout variable of airflow, a hint how this can be implemented within the related classes or any other workarounds.
many thanks.