I have 2 sets of operators in Airflow that I run in parallel, with one set being downstream of the first parallel set.
chain([task_1a, task_2a, task_3a], [task_1b, task_2b, task_3b], end_task)
I utilized the chain() operator since the >> bitshift operator isn't compatible between two lists, i.e.,
[task_1a, task_2a, task_3a] >> [task_1b, task_2b, task_3b] >> end_task
Now I want to add a second variation of this operator pipeline based on some condition. I figured I could do this via branching and the BranchPythonOperator. AFAIK the BranchPythonOperator will return either one task ID string or a list of task ID strings. However, I have not found any public documentation or successful examples of using the BranchPythonOperator to return a chained sequence of tasks involving parallel tasks.
I've tried the method below as well as other variations, but so far I've encountered issues with operators downstream of 'option1' or 'option2' being skipped entirely.
Another Stackoverflow post mentioned needing to alter the trigger rules with branching - so I've also tried setting the trigger_rule of the end_task to 'all_success' but that has no effect either.
from airflow import DAG
from airflow.operators.python import BranchPythonOperator
from airflow.operators.dummy import DummyOperator
from airflow.utils.helpers import chain
def _choose_best_model():
value = 6
if value > 10:
return 'option1'
else:
return 'option2’
with DAG('branching', schedule_interval='@daily', default_args=default_args, catchup=False) as dag:
choose_best_model = BranchPythonOperator(
task_id='choose_best_model',
python_callable=_choose_best_model
)
option1 = DummyOperator(
task_id='option1'
)
option2 = DummyOperator(
task_id='option2'
)
#Parallel tasks
task_1a = DummyOperator(
task_id='task_1a'
)
task_2a = DummyOperator(
task_id='task_2a'
)
task_3a = DummyOperator(
task_id='task_3a'
)
task_1b = DummyOperator(
task_id='task_1b'
)
task_2b = DummyOperator(
task_id='task_2b'
)
task_3b = DummyOperator(
task_id='task_3b'
)
end_task = DummyOperator(
task_id='end_task'
)
choose_best_model >> [option1, option2]
chain(option1, [task_1a, task_2a, task_3a], [task_1b, task_2b, task_3b], end_task)
chain(option2, [task_1a, task_2a], [task_1b, task_2b], end_task)