0

I have a dag with one task, and I want it to get triggered only once a day. My problem is that it gets triggered multiple times when the time comes. So the daily task is run 4 times instead of once. I set a number of configurations to fix that including:

'retries': 1
 catchup=False, max_active_runs=1

I also increased the time between retires thinking maybe airflow thinks the task has failed/not started since it might take some time for task to finish.

I also moved all the code that is supposed to run in that dag to utils folder based on this answer

But I don't know what am I missing here. Can anyone please help? Thank you in advance.

Here is the dag

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python_operator import PythonOperator

from utils.postgres import postgres_backup_to_s3

default_args = {
    'retries': 1,
    'retry_delay': timedelta(minutes=30),#getting backup and uploading to s3 might take some time
    'start_date': datetime(2021, 1, 1)
}

with DAG('postgres_backup', default_args=default_args, schedule_interval='0 19 * * * *',
         catchup=False, max_active_runs=1) as dag:
    postgres_backup_to_s3_task = PythonOperator(task_id="postgres_backup_to_s3", python_callable=postgres_backup_to_s3)
asal
  • 1
  • 1

1 Answers1

0

If your goal is to run the job once per day at 7PM (1900) then your schedule_interval is incorrect. You want 0 19 * * *.

In the Airflow documentation their examples for schedule_intervals consist of 5 space separated elements: https://airflow.apache.org/docs/apache-airflow/1.10.1/scheduler.html#dag-runs.

  • Hi, thank you for replying. I changed the format and now the dag does not get triggered period. – asal Nov 10 '21 at 00:22