Suppose we have a ragged, nested sequence like the following:
import numpy as np
x = np.ones((10, 20))
y = np.zeros((10, 20))
a = [[0, x], [y, 1]]
and want to create a full numpy
array that broadcasts the ragged sub-sequences (to match the maximum dimension of any other sub-sequence, in this case (10,20)
) where necessary. First, we might try to use np.array(a)
, which yields the warning:
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
By changing to np.array(a, dtype=object)
, we do get an array. However, this is an array of objects rather than floats, and retains the ragged subsequences, which have not been broadcasted as desired. To fix this, I created a new function to_array
which takes a (possibly ragged, nested) sequence and a shape and returns a full numpy array of that shape:
def to_array(a, shape):
a = np.array(a, dtype=object)
b = np.empty(shape)
for index in np.ndindex(a.shape):
b[index] = a[index]
return b
b = np.array(a, dtype=object)
c = to_array(a, (2, 2, 10, 20))
print(b.shape, b.dtype) # prints (2, 2) object
print(c.shape, c.dtype) # prints (2, 2, 10, 20) float64
Note that c
, not b
, is the desired result. However, to_array
relies on a for loop over nindex, and Python for loops are slow for big arrays.
Is there an alternative, vectorized way to write the to_array
function?