CeleryExecutor in Airflow are not parallelizing tasks in a subdag

Question

We're using Airflow:1.10.0 and after some analysis why some of our ETL processes are taking so long we saw that the subdags are using a SequentialExecutor instead to use BaseExecutor or when we configure the CeleryExecutor.

I would like to know if this is a bug or an expected behavior of Airflow. Doesn't make any sense have some capability to execute tasks in parallel but in some specific kind of task, this capability is lost.

score 10 · Answer 1 · answered Aug 16 '18 at 23:10

It is a typical pattern to use the SequentialExecutor in subdags with the idea that you often are executing a lot of similar related tasks and don't necessarily want the added overhead of going through adding to queues in celery, etc. See the "other tips" section in the Airflow docs for subdags: https://airflow.apache.org/concepts.html#subdags

By default subdags are set to use the Sequential Executor (see: https://github.com/apache/incubator-airflow/blob/v1-10-stable/airflow/operators/subdag_operator.py#L38) but you can change that.

To use the celery executor, add in the following in your subdag creation:

from airflow.executors.celery_executor import CeleryExecutor
mysubdag = SubDagOperator(
    executor=CeleryExecutor()
    ...
)

it is the same behavior for `KubernetesExecutor`? Does it apply the same? — Flavio, Aug 17 '18 at 08:14

score 3 · Answer 2 · answered May 04 '19 at 22:17

3

Maybe a little bit late but implementing LocalExecutor works for me.

from airflow.executors.local_executor import LocalExecutor

subdag = SubDagOperator(
  task_id=task_id,
  default_args=default_args,
  executor= LocalExecutor(),
  dag=dag
)

answered May 04 '19 at 22:17

strider

683
9
25

CeleryExecutor in Airflow are not parallelizing tasks in a subdag

2 Answers2