I have the following setting: a function returning an array and a Dask array.
I want to call the function inside a for loop and fill a dask array with the function's return. This should be done in parallel.
import dask
import numpy as np
def some_function(params):
# do calculations and returns an array X
... # calculations
return some_array
I want to fill the Dask array in parallel in this manner : ( The code below won't work as the output is a delayed object)
if __name__ == '__main__' :
client = Client(n_workers=4)
N = 20_000
# (20,2) is the shape of the returned array by some_function
X = dask.da.zeros(shape=(N, 20, 2), chunks=(1, 20, 2))
# List of parameters taken by some_function
l = [ np.random.random(size=3) for i in range(N)]
for i, param in enumerate(l):
output = dask.delayed(some_function)(param)
X[i] = output
What I want is to be able to do both computations and storage in parallel.
Thanks for your help.