1

Let's say I have a delayed function which does a certain task but it needs a dict to store intermediate key/value pairs which are read and modified in each dask worker.

Can delayed or another mechanism be used to share the cache dict across workers?

I can't seem to find any documentation about doing this.

Nathan McCoy
  • 3,092
  • 1
  • 24
  • 46

2 Answers2

1

You could probably achieve what you want using actors - which, be warned, are marked as "experimental" and do not see too much use. The data structure would be stored on one particular worker, and other workers would communicate with it to affect changes. Therefore, if there's any chance of workers going down, you would stand to loose results.

Naturally, you could instead interface your tasks with any external key/value storage: in-cluster things like redis or even a shared filesystem, or external things like cloud storage.

mdurant
  • 27,272
  • 5
  • 45
  • 74
0

There are a variety of ways to coordinate data between workers. I recommend looking at Dask's Coordination Primitives

MRocklin
  • 55,641
  • 23
  • 163
  • 235