0

I have an airflow DAG as below

task1 -> task2 -> [task3_1, task3_2]
task3_1 -> task4_1 -> task5_1
task3_2 -> task4_2 -> task5_2
[task5_1,task5_2] -> task_6 -> task_7 -> task_8

Tasks 1,2,6,7,8 are linear with 2 parallel jobs for 3,4,5 in between.

I need task_6 to run no matter what state upstream tasks are in, so I set branch_trigger_rule to "all_done" which works fine - It executes if upstream are all success or if some task fails.

But task_7 and task_8 gets executed as well once task_6 completes - I want both of them to execute only when all are success.

I tried setting branch_trigger_rule for task_7 and task_8 to "all_success" and even tried "none_failed" but no luck.

Am I missing something?

Below is the code

task_1 = add_task(dag_name=my_dag,task_type="bash",task_id="task_1",command='echo " The Dag [ {} ] started at [ `date` ]"; '.format(self.dag_name))

task_2 = add_task(dag_name=my_dag,task_type="bash",task_id="task_2",command=self.SPARK_SUBMIT_COMMAND_2")

task3_1 = add_task(dag_name=my_dag,task_type="bash",task_id="task3_1",command=self.SPARK_SUBMIT_COMMAND_3_1)

task3_2 = add_task(dag_name=my_dag,task_type="bash",task_id="task3_2",command=self.SPARK_SUBMIT_COMMAND_3_2)

task4_1 = add_task(dag_name=my_dag,task_type="bash",task_id="task4_1",command=self.SPARK_SUBMIT_COMMAND_4_1)

task4_2 = add_task(dag_name=my_dag,task_type="bash",task_id="task4_2",command=self.SPARK_SUBMIT_COMMAND_4_2)

task5_1 = add_task(dag_name=my_dag,task_type="bash",task_id="task5_1",command=self.SPARK_SUBMIT_COMMAND_5_1)

task5_2 = add_task(dag_name=my_dag,task_type="bash",task_id="task5_2",command=self.SPARK_SUBMIT_COMMAND_5_2)

task_6 = add_task(dag_name=my_dag,task_type="bash",task_id="task_6",command=self.SPARK_SUBMIT_COMMAND_6,branch_trigger_rule="all_done")

task_7 = add_task(dag_name=my_dag,task_type="bash",task_id="task_7",command=self.SPARK_SUBMIT_COMMAND_7,branch_trigger_rule="all_success")

task_8 = add_task(dag_name=my_dag,task_type="bash",task_id="task_8",command='echo " The Dag [ {} ] ended at [ `date` ]"; '.format(self.dag_name),branch_trigger_rule="all_success")

Tried none_failed as well for last 2 Tasks

task_7 = add_task(dag_name=my_dag,task_type="bash",task_id="task_7",command=self.SPARK_SUBMIT_COMMAND_7,branch_trigger_rule="none_failed")

task_8 = add_task(dag_name=my_dag,task_type="bash",task_id="task_8",command='echo " The Dag [ {} ] ended at [ `date` ]"; '.format(self.dag_name),branch_trigger_rule="none_failed")
Venkatesh Gotimukul
  • 741
  • 1
  • 9
  • 30

1 Answers1

1

As far as I know, the trigger rule refers to the direct upstream tasks only

So "all_success" on Task7, checks only Task6 status

If you want task7 (and task8 after him) to run only if task5_1, task5_2 and task6 has succeeded,
then add another relationship as follows:
[task5_1, task5_2] -> task7 and set the trigger rule of task7 as "all_success"

enter image description here

Saar Levy
  • 330
  • 2
  • 6