Is it possible to have a multi-node Dask cluster be the compute for a PythonScriptStep
with AML Pipelines?
We have a PythonScriptStep
that uses featuretools
's, deep feature synthesis (dfs
) (docs). ft.dfs()
has a param, n_jobs
which allows for parallelization. When we run on a single machine, the job takes three hours, and runs much faster on a Dask. How can I operationalize this within an Azure ML pipeline?