Is there a way to limit the number of cores used by the default threaded scheduler (default when using dask dataframes)?
With compute
, you can specify it by using:
df.compute(get=dask.threaded.get, num_workers=20)
But I was wondering if there is a way to set this as the default, so you don't need to specify this for each compute
call?
The would eg be interesting in the case of a small cluster (eg of 64 cores), but which is shared with other people (without a job system), and I don't want to necessarily take up all cores when starting computations with dask.