I have a dask.delayed
function that takes an xarray.Dataarray
as an argument and returns one as well.
I'm creating a few of these delayed tasks and pass them to client.compute
using dask.distributed
. Each call to compute returns a distributed.client.Future
representing the data array that will be returned.
My question is:
Is there a way to build a "lazy" data array again from the future without loading the actual data from the worker? My intention is to built a second task graph based on the output from the first computation.
I've seen client.gather
but this seems to pull all the data back to the client, which is not what I want.
Here's a small example:
import dask
from distributed import Client
import xarray as xr
# load example data
x = xr.tutorial.open_dataset("air_temperature")
# use first timestep
x_t0 = x.isel(time=0)
# delayed 'processing' function
@dask.delayed
def fun(x):
return x*2
# init client
client = Client()
# compute on worker
future = client.compute(fun(x_t0))
# when done
print(future)
# <Future: finished, type: xarray.Dataset, key: fun-96cd56f4-4ed3-4eac-ade9-fe3f17e4b8c6>
## now how to get back to lazy xarray from future?