I am trying to load a dataset with dask but when it is time to compute my dataset I keep getting problems like this:
WARNING - Worker exceeded 95% memory budget. Restarting.
I am just working on my local machine, initiating dask as follows:
if __name__ == '__main__':
libmarket.config.client = Client() # use dask.distributed by default
Now in my error messages I keep seeing a reference to a 'memory_limit=' keyword parameter. However I've searched the dask documentation thoroughly and I can't figure out how to increase the bloody worker memory-limit in a single-machine configuration. I have 256GB of RAM and I'm removing the majority of the future's columns (a 20GB csv file) before converting it back into a pandas dataframe, so I know it will fit in memory. I just need to increase the per-worker memory limit from my code (not using dask-worker) so that I can process it.
Please, somebody help me.