0

I found in answer that can use billiard for multiprocessing in airflow, as python multiprocessing cannot pickle and joblib Parallel can only use threading mode. I tried billiard Pool map. The program just stuck when running this part and no error raised. I'm using the newest version 2.5.0 airflow. How to make it work and what is the right way to do multiprocessing in airflow tasks? Thanks very much!

import pendulum
from billiard import Pool
from airflow import DAG
from airflow.decorators import task

with DAG(dag_id='ttest',
         schedule_interval="40 * * * *") as dag:
    @task(task_id='te')
        def test_task(ds=None, **kwargs):
            def test(l):
                return sum(l)

            a = [[1,2], [2,3], [3,4]]
            print('start pool')
            with Pool(2) as pool:
                res = pool.map(test, a)
            print(res)

        t = test_task()
        t
viminal
  • 76
  • 1
  • 5

0 Answers0