dask.distributed
keeps data in memory on workers until that data is no longer needed. (Thanks @MRocklin!)
While this process is efficient in terms of network usage, it will still result in frequent pickling and unpickling of data. I assume this is not a zero copy pickling, like memmapping would do for plasma
or joblib
in case of numpy
arrays.
It's clear that when calculation result is needed on another host or other worker, it has to be pickled and sent over. This can be avoided when relying on threaded parallelism inside the same host, as all calculations access the same memory (--nthreads=8
). dask
is using this trick and simply accesses the result of calculations instead of pickling.
When we're using process based parallelization instead of threading in a single host (--nproc=8 --nthreads=1
), I'd expect that dask
pickles and unpickles again to send data through to the other worker. Is this correct?