0

I would like to direct all dask temporary data to my fast and big disk at /mnt/1. I am running the scheduler like so:

dask-scheduler --local-directory /mnt/1

and the workers:

dask-worker 127.0.0.1:8786 --memory-limit 16GB --nthreads 1 --nprocs 6 --local-directory /mnt/1/

My imports look like this:

import dask
from dask import dataframe as dd
from dask import delayed
from dask.distributed import Client
client = Client('localhost:8786', set_as_default=True)
dask.config.set(shuffle='disk')

And yet, I am still seeing a partd directory being created and filled with stuff in my /tmp directory, which is not on my fast and big disk.

My question is: how do I tell dask distributed to send absolutely all temporary data to /mnt/1 and not put anything in /tmp?

Stephen
  • 107
  • 9
  • possible duplicate of https://stackoverflow.com/questions/40042748/how-to-specify-the-directory-that-dask-uses-for-temporary-files? – Arco Bast May 17 '19 at 22:59

1 Answers1

0

This appears to work, note the last new line. A bit annoying that the command line flags don't actually do what they suggest they do.

import dask
from dask import dataframe as dd
from dask import delayed
from dask.distributed import Client
client = Client('localhost:8786', set_as_default=True)
dask.config.set(shuffle='disk')
dask.config.set({'temporary_directory': '/mnt/1'})
Stephen
  • 107
  • 9
  • You can set it to your homefolders [dask configuration](https://docs.dask.org/en/latest/configuration.html) also. But yes, a command line option would be nice. – gies0r Aug 05 '20 at 15:44