-2

Hi I have a dag that has some tasks that I need to run parallels then other tasks that have to wait until the parallel tasks all finish before it can start.

Here I've created a diagram of what I want. A1 and B1 to start at the same time when the DAG is started. A1 initiates A2 when A1 is finished, and B1 starts B2 when finished then B2 -> B3. Only When A2 and B3 are both finished should C1 then start, finally when finished C2.

How can I write the dependency for this sort of structure? Thank you.

unnest_me
  • 119
  • 1
  • 2
  • 9
  • 1
    What have you tried so far? – Matt Jul 24 '23 at 13:47
  • 1
    Second what Matt said, what have you tried so far as what you are explaining is the basic function of how airflow scheduling works so I would recommend just looking at some airflow docs – Tevett Goad Jul 25 '23 at 17:03

2 Answers2

1

the way you created your diagram its the half solution, now it only technically how to write the dag and the upstream.

if you nees task to work in parralel so need to share the final task (in this example c1). also you want to tell c1 to run only after all upstream task finished succesfully. in that case you should use trigger_rull=all_success (its the default)

from datetime import datetime

from airflow import DAG
from airflow.operators.empty import EmptyOperator
from airflow.utils.trigger_rule import TriggerRule

with DAG(
        dag_id="test_dag",
        schedule_interval=None,
        default_args={
            "start_date": datetime(2022, 1, 1),
            "retries": 0,
            "catchup": False,
        },
        render_template_as_native_obj=True,
        tags=["test"],
) as dag:
    dag.doc_md = __doc__

    a1 = EmptyOperator(task_id="A1")
    a2 = EmptyOperator(task_id="A2")

    b1 = EmptyOperator(task_id="B1")
    b2 = EmptyOperator(task_id="B2")
    b3 = EmptyOperator(task_id="B3")

    c1 = EmptyOperator(task_id="C1", trigger_rule=TriggerRule.ALL_SUCCESS)
    c2 = EmptyOperator(task_id="C2")

    (a1 >> a2) >> c1
    (b1 >> b2 >> b3) >> c1
    c1 >> c2

as you can see in Airflow graph, we got your diagram

enter image description here

ozs
  • 3,051
  • 1
  • 10
  • 19
0

This code works by first creating a DAG object. Then, it adds four tasks to the DAG: A1, B1, A2, B2, B3, C1, and C2. The depends_on attribute of each task specifies the tasks that must be completed before the task can start. For example, the depends_on attribute of A2 specifies that A1 must be completed before A2 can start.

 def create_dag():
  dag = DAG('my_dag')

  A1 = dag.add_task('A1')
  B1 = dag.add_task('B1')
  A2 = dag.add_task('A2', depends_on=[A1])
  B2 = dag.add_task('B2', depends_on=[B1])
  B3 = dag.add_task('B3', depends_on=[B2])
  C1 = dag.add_task('C1', depends_on=[A2, B3])
  C2 = dag.add_task('C2', depends_on=[C1])

  return dag


dag = create_dag()

or you can always define the Dag and then the tasks like:

a1 = DOperator(task_id='A1', dag=dag)
b1 = DOperator(task_id='B1', dag=dag)

a2 = PythonOperator(task_id='A2', python_callable=my_python_function, dag=dag)

etc....