1

I have the following setting: a function returning an array and a Dask array.

I want to call the function inside a for loop and fill a dask array with the function's return. This should be done in parallel.

import dask
import numpy as np

def some_function(params):
# do calculations and returns an array X
... # calculations
return some_array

I want to fill the Dask array in parallel in this manner : ( The code below won't work as the output is a delayed object)

if __name__ == '__main__' :

    client = Client(n_workers=4)

    N = 20_000

    # (20,2) is the shape of the returned array by some_function
    X = dask.da.zeros(shape=(N, 20, 2), chunks=(1, 20, 2))
    
    # List of parameters taken by some_function
    l = [ np.random.random(size=3) for i in range(N)]

    for i, param in enumerate(l):
        output = dask.delayed(some_function)(param)
        X[i] = output

        

What I want is to be able to do both computations and storage in parallel.

Thanks for your help.

python_user
  • 5,375
  • 2
  • 13
  • 32
alpha027
  • 302
  • 2
  • 13
  • 1
    did my suggestion work? – python_user Oct 16 '21 at 16:26
  • @python_user, it did not, I am still trying, the function used in my case is odeint of Scipy. The new error I get is the following : TypeError: Delayed objects are immutable. Your suggestion is definitely getting me closer to the solution. – alpha027 Oct 16 '21 at 16:50
  • can you edit the question to include the function? and your exact code – python_user Oct 16 '21 at 16:50
  • Your solution is working, I created a more specif question here : https://stackoverflow.com/questions/69597892/using-dask-with-scipy-odeint-delayed-objects-are-immutable – alpha027 Oct 16 '21 at 17:14
  • I will give that a look – python_user Oct 16 '21 at 17:16
  • All I had to do was to quit and open up again my code editor and the problem was gone. Indeed your solution is effective, I deleted the other post as there was no actual issue in it. thanks @python_user ! – alpha027 Oct 16 '21 at 17:45
  • 1
    I have edited the title to describe it better – python_user Oct 16 '21 at 17:56

1 Answers1

1

You seem to want dask.array.from_delayed, You can then .compute the results later when you need.

import numpy as np
import dask
import dask.array as da
from dask.distributed import Client


@dask.delayed
def some_function(param):
    return np.random.rand(20, 2)


if __name__ == "__main__":
    client = Client(n_workers=2)
    N = 10
    X = da.zeros(shape=(N, 20, 2), chunks=(1, 20, 2))
    l = [np.random.random(size=3) for i in range(N)]

    for i, param in enumerate(l):
        output = some_function(param)
        X[i] = da.from_delayed(output, shape=(20, 2), dtype=np.float64)

Output

print(X[0].compute())

[[0.3521712  0.6159578 ]
 [0.67023109 0.13890086]
 [0.71952075 0.3986291 ]
 [0.76702816 0.84669244]
 [0.82703851 0.72321066]
 [0.92060717 0.77926133]
 [0.27857667 0.2510426 ]
 [0.85014582 0.34709649]
 [0.46328749 0.44324011]
 [0.84134094 0.28890227]
 [0.33616886 0.09771338]
 [0.35734385 0.0832578 ]
 [0.04038898 0.41059205]
 [0.01776568 0.31226509]
 [0.03036941 0.70490505]
 [0.78646762 0.33381309]
 [0.02535621 0.5715431 ]
 [0.16349511 0.37746425]
 [0.11798384 0.87281911]
 [0.26136318 0.59016981]]
python_user
  • 5,375
  • 2
  • 13
  • 32