0

Context: I'm using custom dask graphs to manage and distribute computations.

Problem: Some tasks include reading in files which are produced outside of dask and not necessarily available at the time of calling dask.get(graph,result_key).

Question: Having the i/o tasks wait for files is not an option as this would block workers. Is there (or which would be) a good way to let dask wait for the files to become available and only then execute the i/o tasks?

Thanks a lot for any thoughts!

Community
  • 1
  • 1
malbert
  • 308
  • 1
  • 7

1 Answers1

0

It sounds like you might want to use some of the more real-time features of Dask, described here.

You might consider making tasks that use secede and rejoin or use async-await style programming and only launch tasks once your client process notices that they exist.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • Great, I didn't know this functionality. Basically the task waiting for the file would contain `secede()`, a waiting loop and then `rejoin()`. If I understand correctly, this makes the task wait for the file in an administrative thread, thereby not blocking the worker's thread pool. Thanks! – malbert Jul 09 '19 at 17:36