5

I have one DAG that I pass a variety of configurations, and one of the settings I want to pass is how often it should run.

For example, using the same DAG, I have two different RUNS. RUN A I want to run daily. RUN B I want to run weekly. Both of these use the exact same DAG code but have different configurations passed.

So far as I can see, there is no way to easily pass the schedule within the configuration. The only solution I have is to make multiple DAGs with the exact same code but different schedules, which results in a lot of redundant code duplication.

Is there any better options?

ex: As an example, I have a dag that is a web crawler, and I pass urls for it to crawl. i need to modify the frequency of the crawling for different sets of urls, basically. The urls I am passing can change and there is no way to identify what schedule to use other than the run parameters

Adam K
  • 51
  • 3

1 Answers1

2

In this case since daily contains weekly it's best to just have a daily run and use branch operator to decide what logic to use based on day of the week.

import pendulum

from airflow import DAG
from airflow.operators.empty import EmptyOperator
from airflow.operators.weekday import BranchDayOfWeekOperator

with DAG(
    dag_id="my_dag",
    start_date=pendulum.datetime(2022, 1, 1, tz="UTC"),
    catchup=False,
    schedule_interval="@daily",
) as dag:
    task_a = EmptyOperator(task_id='logic_a') # Replace with your actual operators for 1st configuration/logic
    task_b = EmptyOperator(task_id='logic_b') # Replace with your actual operators for 2nd configuration/logic

    branch = BranchDayOfWeekOperator(
        task_id="make_choice",
        follow_task_ids_if_true="logic_a",
        follow_task_ids_if_false="logic_b",
        week_day="Monday",
    )

    branch >> [task_a, task_b]

In this example the DAG is running every day. On Monday it will follow task_a the rest of the week it will follow task_b.

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
  • this is interesting but both configurations are using the exact same operators, and there isn't necessarily just two, there can be 20 configurations that all want to run with different schedules. I am passing in data with each configuration that is used within the operators, but all the operators are the same, so I need to programmatically schedule them based on the configuration – Adam K May 29 '22 at 17:35
  • It's hard to provide exact answer because your description is vague. The configuration details and the operator you use are unknown. In general you can use the branch to push a value to Xcom so you can use to identify which one of the configuration to use. You can also use dynamic task creation for the downstream tasks of the branch so depending on the branch value only 1 branch side is followed. – Elad Kalif May 29 '22 at 17:50
  • by configuration, I mean data I am passing. the operators and dag code is exactly the same, I just need to customize the schedule. there is no separate branches, it is all one single simple dag. (task a > task b > task c). the only thing that is changing is depending on what data I pass to the dag I need to change the frequency of the schedule interval. – Adam K May 29 '22 at 17:56
  • As an example, I have a dag that is a web crawler, and I pass urls for it to crawl. i need to modify the frequency of the crawling for different sets of urls, basically – Adam K May 29 '22 at 17:57
  • Use the branch to decide which frequency is relevant, logic_a / logic_b will push the proper value of frequency parameter to xcom and from there all downstream tasks can read the value and use it. e.g a workflow of `branch >> [logic_a , logic_b] >> task_c >> task_d`. – Elad Kalif May 29 '22 at 18:46