I am running Dask on a single computer where running .compute()
to perform the computations on a huge parquet file will cause dask to use up all the CPU cores on the system.
import dask as dd
df = dd.read_parquet(parquet_file) # very large file
print(df.names.unique().compute())
Is it possible to configure dask to use a specific number of CPU cores and limit its memory usage to say 32 GB? Using Python 3.7.2 and Dask 2.9.2.