1

I have two 2d matrices in a list, which i want to convert to a numpy array. Below are 3 examples a,b,c .

>>> import numpy as np
>>> a = [np.zeros((3,5)), np.zeros((2,9))]
>>> np.array(a)
>>> array([array([[0., 0., 0., 0., 0.],
    [0., 0., 0., 0., 0.],
    [0., 0., 0., 0., 0.]]),
    array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
    [0., 0., 0., 0., 0., 0., 0., 0., 0.]])], dtype=object)
>>> b = [np.zeros((3,5)), np.zeros((3,9))]
np.array(b)
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2019.2.4\helpers\pydev\_pydevd_bundle\pydevd_exec.py", line 3, in Exec
    exec exp in global_vars, local_vars
  File "<input>", line 1, in <module>
ValueError: could not broadcast input array from shape (3,5) into shape (3)
>>> c = [np.zeros((3,5)), np.zeros((4,9))]
np.array(c)
array([array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]),
array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.]])], dtype=object)

As one can observe case a & c work but b does not. b does throw an exception. The difference is that in example b the first dimension of the 2 matrices match.

I found the following answer, which explains why this behaviour occurs.

If only the first dimension does not match, the arrays are still matched, but as individual objects, no attempt is made to reconcile them into a new (four dimensional) array.

My Question: I don't want numpy to reconcile the matrices. I just want the same behaviour as if the first dimension doesn't match. I want them to be matched as indivudal objects even if they have the same first dimension. How do I achieve this ?

KoKlA
  • 898
  • 2
  • 11
  • 15

2 Answers2

2

Numpy still complains even if you explicitly pass object as the dtype:

>>> np.array(b, dtype=object)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (3,5) into shape (3)

Essentially, numpy is not really written around using dtype=object, it always assumes you want an array with a primitve numeric or structured dtype.

So I think your only option is something like:

>>> arr = np.empty(len(b), dtype=object)
>>> arr[:] = b
>>> arr
array([array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]]),
       array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.]])], dtype=object)

And just for fun, you can use the actual np.ndarray type constructor, although this isn't very easy:

>>> np.ndarray(dtype=object, shape=len(b), buffer=np.array(list(map(id, b)),dtype=np.uint64))
array([array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]]),
       array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.]])], dtype=object)

And note, that relies on a CPython implementation detail, that id is simply the address of the python object. So mostly I'm just showing it for fun.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
2

In the latest version we are starting to see a warning:

In [185]: np.__version__                                                                             
Out[185]: '1.19.0'
                                                
In [187]: np.array([np.zeros((3,5)), np.zeros((2,9))])                                               
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  #!/usr/bin/python3
Out[187]: 
array([array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]]),
       array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.]])], dtype=object)

It still makes the object dtype array. In the matching first dimension case we get the warning and error.

In [188]: np.array([np.zeros((3,5)), np.zeros((3,9))])                                               
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  #!/usr/bin/python3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-188-b6a4475774d0> in <module>
----> 1 np.array([np.zeros((3,5)), np.zeros((3,9))])

ValueError: could not broadcast input array from shape (3,5) into shape (3)

Basically np.array tries, as first step, to make a multidimensional numeric array. Failing that it takes two routes - make an object dtype array or failure. Details are buried in compiled code.

The preallocate and assignment is the best way if you want full control over how the object array is created.

In [189]: res=np.empty(2,object)                                                                     
In [191]: res[:] = [np.zeros((3,5)), np.zeros((3,9))]                                                
hpaulj
  • 221,503
  • 14
  • 230
  • 353