How many dask jobs per worker

Question

If I spin up a dask cluster with N workers and then submit more than N jobs using cluster.compute, does dask try to run all the jobs simultaneously (by scheduling more than 1 job on each worker) or are the jobs queued and run sequentially ?

My recent experience of doing this seems to suggest the latter. Each job is pretty memory intensive and submitting more jobs than workers causes them all to crash due to memory issues.

Is there a way to force dask to strictly run only 1 job on 1 worker at a time and queue the other jobs ?

kgoodrick · Answer 1 · 2020-11-30T17:42:41.590

The default behavior is set by the size of the cluster. If the number of workers is greater than 4 dask tries to guess at a good number of threads to use in each worker. If you want to change this behavior you can change the number of threads per worker with the threads_per_worker keyword argument when creating the cluster:

cluster = LocalCluster(threads_per_worker=1)
client = Client(cluster)
cluster.compute(...)

If you're using an SSHCluster you need to pass the number of threads per worker as an argument to the worker:

cluster = SSHCluster(worker_options={"nthreads": 1})
client = Client(cluster)
cluster.compute(...)

How many dask jobs per worker

1 Answers1