I would like to assign multiple small Dask arrays into parts of one large Dask array. My problem is similar to the one addressed in this post, expect my small arrays have a variable shape. My problem is also similar to the one addressed in this post, except I would like to assign them to a 2D location in the array that isn't sequential like it would be in a for
loop, which also means that operations like stack
and concatenate
don't play nicely.
## Initialize large array
big_array = da.zeros([5, 6]) # I know this shape ahead of time
# Mock little arrays
aa_shape = dask.delayed((2,3)) # I don't know this shape ahead of time
aa = dask.delayed(1 * da.ones(aa_shape))
aa_loc = dask.delayed((slice(0,2), slice(0,3))) # I don't know this location ahead of time
bb_shape = dask.delayed((3,3))
bb = dask.delayed(2 * da.ones(bb_shape))
bb_loc = dask.delayed((slice(0,3), slice(3,6)))
cc_shape = dask.delayed((3,3))
cc = dask.delayed(3 * da.ones(cc_shape))
cc_loc = dask.delayed((slice(2,5), slice(0,3)))
dd_shape = dask.delayed((2,3))
dd = dask.delayed(4 * da.ones(dd_shape))
dd_loc = dask.delayed((slice(3,5), slice(3,6)))
# Manually populate big array
big_array[aa_loc] = aa
big_array[bb_loc] = bb
big_array[cc_loc] = cc
big_array[dd_loc] = dd
big_array.compute()
Ideally the above code would output a big_array
that looks like
array([[1., 1., 1., 2., 2., 2.],
[1., 1., 1., 2., 2., 2.],
[3., 3., 3., 2., 2., 2.],
[3., 3., 3., 4., 4., 4.],
[3., 3., 3., 4., 4., 4.]])
However, I get the error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [7], in <cell line: 21>()
18 locs = [aa_loc, bb_loc, cc_loc, dd_loc]
20 # Manually populate big array
---> 21 big_array[aa_loc] = aa
22 big_array[bb_loc] = bb
23 big_array[cc_loc] = cc
File ~/.conda-envs/daskenv202301/lib/python3.9/site-packages/dask/array/core.py:1893, in Array.__setitem__(self, key, value)
1890 value = asanyarray(value)
1892 out = "setitem-" + tokenize(self, key, value)
-> 1893 dsk = setitem_array(out, self, key, value)
1895 meta = meta_from_array(self._meta)
1896 if np.isscalar(meta):
File ~/.conda-envs/daskenv202301/lib/python3.9/site-packages/dask/array/slicing.py:1754, in setitem_array(out_name, array, indices, value)
1752 array_shape = array.shape
1753 value_shape = value.shape
-> 1754 value_ndim = len(value_shape)
1756 # Reformat input indices
1757 indices, implied_shape, reverse, implied_shape_positions = parse_assignment_indices(
1758 indices, array_shape
1759 )
File ~/.conda-envs/daskenv202301/lib/python3.9/site-packages/dask/delayed.py:591, in Delayed.__len__(self)
589 def __len__(self):
590 if self._length is None:
--> 591 raise TypeError("Delayed objects of unspecified length have no len()")
592 return self._length
TypeError: Delayed objects of unspecified length have no len()
If I modify the code so that the little arrays use simple Dask arrays instead of delayed
objects, the code runs successfully. Does anyone have suggestions on how to approach this? Thanks for the help!