I don't understand the relation between regular Dask and dask.distributed
.
With dask.distributed
, e.g. using the Futures interface, I have to explicitly create a client, which is backed by a local or remote cluster, and then submit to it using client.submit()
.
With regular Dask, e.g. using the Delayed interface, I just use delayed()
on my functions.
How does delayed
(or compute
) determine where my computation takes place? There must be some global state behind it – but how would I access it? If I understand correctly, delayed
uses a dask.distributed
client if it exists. Does it use something like
client = None
try:
client = Client.current()
except ValueError:
pass
if client is not None:
# use client
else:
# use default scheduler
If so, why not use the same logic for submit
?
client = None
try:
client = Client.current()
except ValueError:
pass
if client is not None:
# use client
else:
# fail because futures don't work on the default scheduler
And finally, delayed objects and future objects appear very similar. Why can the first use both a dask.distributed
client and the default scheduler, while futures need dask.distributed
?