I want to use dask.delayed
in my work. Some of the functions I want to apply it to will work with xarray
based data. I know xarray
itself can use dask under the hood. For dask itself, it is recommended you don't pass dask arrays to delayed functions. I want to make sure that I don't hit any similar pitfalls by passing xarray objects to delayed function.
My current assumption is I should avoid creating dask-backed xarray datasets, which I think will be achieved by avoiding the chunk
keyword and not using open_mfdataset
. My question is ifs this restriction necessary and sufficient to avoid problems?
To give a concrete (but heavily simplified) example of the type of code I'm thinking to write, I'm thinking of something like this. The idea is that dask.delayed
handles the parallelism, not xarray
.
@dask.delayed
def pow2(ds):
return ds**2
delayed_ds = dask.delayed(xr.open_dataset)('myfile.nc')
delayed_ds_squared = pow2(delayed_ds)
ds_squared = dask.compute(delayed_ds)