I'm implementing various kinds of data processing pipelines using dask.distributed. Usually the original data is read from S3 and in the end processed (large) collection would be written to CSV on S3 as well.
I can run the processing asynchonously and monitor progress, but I've noticed that all to_xxx() methods that store collections to file(s) seem to be synchronous calls. One downside of it is that the call blocks, potentially for a very long time. Second, I cannot easily construct a complete graph to be executed later.
Is there a way to run e.g. to_csv() asynchronously and get a future object instead of blocking?
PS: I'm pretty sure that I can implement async storage myself, e.g. by converting collection to delayed() and storing each partition. But it seems like a common case - unless I missed already existing feature it would be nice to have something like this included in the framework.