Reduce i/o by storing data into a dictionary shared between workers on node using dask.distributed

Asked Sep 20 '17 at 17:54

Active Sep 20 '17 at 17:59

Viewed 154 times

I am using dask.distributed scheduler and workers to process some large microscopy images on a cluster. I run multiple workers per node (1 core = 1 worker). Each core in the node share 200Gb of RAM.

Issue
I would like to decrease the writing instances to the shared HD of the cluster.

Questions

The idea is to create a dictionary shared within the node, fill it up with processed images until the size reaches ~80% of the RAM and then each image in the dictionary will be saved to the HD as a separate file. Is it possible to share a dictionary between the workers in a node?
Each of the images in the dictionary will be written to a different file. Will it make a difference to write them as looping through the dictionary or the speed/ and amount of i/o calls will be the same as writing one image at the time during the processing?

I don't have a current running example because I couldn't figure out how to share a variable between workers on the same node.

Thanks

edited Sep 20 '17 at 17:59

juanpa.arrivillaga

88,713
10
131
172

asked Sep 20 '17 at 17:54

s1mc0d3

Sounds like a job for Redis. – juanpa.arrivillaga Sep 20 '17 at 17:59

Reduce i/o by storing data into a dictionary shared between workers on node using dask.distributed

0 Answers0