1

Lets say I have something like this:

def foo(a):
    return a.sum()
x = np.random.rand(1000000,70)
X = dask.array.from_array(x)

X_list = [dask.delayed(foo)(X) for n in range(600)]

Xsums = dask.compute(*X_list)

This seem to get hung up in execution because of memory issues, which makes me suspect that when compute is run, dask creates many copies of the array X in memory. When there's only 2 elements in X_list, this executes fine.

Is there a way to make dask use pointers to X instead of creating 600 copies of X?

TKK
  • 11
  • 2

0 Answers0