I have the following function
@dask.delayed
def load_ds(p):
import xarray as xr
multi_file_dataset = xr.open_mfdataset(p, combine='by_coords', concat_dim="time", parallel=True)
mean = multi_file_dataset['tas'].mean(dim='time')
return mean
which opens a set of NetCDF files (identified by path p) and calculates the mean value over time.
I'm trying to run in parallel the function over two differents paths (= datasets):
results = []
result1 = dask.delayed(load_ds)(path1)
results.append(result1)
result2 = dask.delayed(load_ds)(path2)
results.append(result2)
results = dask.compute(*results)
I've also tried
results = []
result1 = dask.delayed(load_ds)(path1)
results.append(result1)
result2 = dask.delayed(load_ds)(path2)
results.append(result2)
futures = dask.persist(*results)
results = dask.compute(*futures)
But, I noticed that the execution actually starts when I try to retrieve the results:
print(results[0].values)
And again, when I retrieve the second one
print(results[1].values)
What's wrong? Is there a way to retrieve the results object just once?