I am using dask.distributed
scheduler and workers to process some large microscopy images on a cluster. I run multiple workers per node (1 core = 1 worker). Each core in the node share 200Gb of RAM.
Issue
I would like to decrease the writing instances to the shared HD of the cluster.
Questions
- The idea is to create a dictionary shared within the node, fill it up with processed images until the size reaches ~80% of the RAM and then each image in the dictionary will be saved to the HD as a separate file. Is it possible to share a dictionary between the workers in a node?
- Each of the images in the dictionary will be written to a different file. Will it make a difference to write them as looping through the dictionary or the speed/ and amount of i/o calls will be the same as writing one image at the time during the processing?
I don't have a current running example because I couldn't figure out how to share a variable between workers on the same node.
Thanks