We have a large project that comprises of numerous tasks. We use a dask graph to schedule each task. A small sample of the graph is as follows. Note that dask is set to multiprocessing mode.
dask_graph:
universe: !!python/tuple [gcsstrategies.svc.business_service.UniverseService.load_universe_object, CONTEXT]
raw_market_data: !!python/tuple [gcsstrategies.svc.data_loading_service.RDWLoader.load_market_data, CONTEXT, universe]
raw_fundamental_data: !!python/tuple [gcsstrategies.svc.data_loading_service.RDWLoader.load_fundamental_data, CONTEXT, universe]
dask_keys: [raw_fundamental_data]
Now one of the tasks, raw_fundamental_data
, lazily schedules dask tasks using @delay
and runs them using dask.compute()
. The reason for this design choice is the list of tasks that will be scheduled and lazily run by dask within raw_fundamental_data
are dynamically chosen at runtime based on runtime parameters.
The error we see is:
daemonic processes are not allowed to have children
We understand this is because a spawned process is trying to spawn children. Is there any solution to this problem? Does dask have any way to allow a task scheduled via daskgraph to schedule and lazily run its own tasks either using @delay
or another method.
Please note that in our system there are numerous tasks that will run their own tasks using multiprocessing. So sequential execution is not an option.