1

Some answers on stackoverflow suggest to use a ndarray of ndarray, when working with data in which the number of elements per row is not constant (How to make a multidimension numpy array with a varying row size?).

Is numpy optimized to work on a structure like that (array of arrays, also called nested arrays) ?

Here's a simplified example of such a structure:

import numpy as np
x = np.array([1,2,3])
y = np.array([4,5])
data = np.array([x,y],dtype=object)

It's possible to do operations like:

print(data+1)
print(data+data)

But some operations would fail like :

print(np.sum(data))

What's happening behind the scenes with this type of structure ?

user18048269
  • 125
  • 8
  • 4
    No. Such an array is basically the same as a list, containing references to the component arrays. – hpaulj Jan 30 '22 at 18:01
  • 1
    Check this ;) https://numpy.org/devdocs/dev/internals.html if you want to know more about how the NumPy array is organized in memory. – Khamyl Jan 30 '22 at 18:06
  • My comment is basically a repeat of the accepted answer in your link. There's a difference between explaining what can be done, and suggesting such a use. – hpaulj Jan 30 '22 at 18:40
  • Thanks for your answers. I updated the question to make it more precise. – user18048269 Jan 30 '22 at 20:05
  • What was the `sum` error message? – hpaulj Jan 30 '22 at 20:28
  • Math on such an array is hit or miss. Typically it iterates through the elements and tries to apply the operator or a method. But that iteration can easily fail, as in the `np.sum` case, or `np.exp`. And the speed is basically that of a list comprehension, when it does work. Compared to an equivalent list, an array can, on occasion, be more convenient, but don't ever assume it is just as good as a numeric array. – hpaulj Jan 30 '22 at 20:41

1 Answers1

2

Like a list, an object dtype array can contain objects of any kind. For example

In [6]: arr = np.array([1,"two",[1,2,3],np.array([4,5,6])], object)
In [7]: arr
Out[7]: array([1, 'two', list([1, 2, 3]), array([4, 5, 6])], dtype=object)

Look what happens when we do addition:

In [8]: arr+arr
Out[8]: 
array([2, 'twotwo', list([1, 2, 3, 1, 2, 3]), array([ 8, 10, 12])],
      dtype=object)
In [10]: arr*2
Out[10]: 
array([2, 'twotwo', list([1, 2, 3, 1, 2, 3]), array([ 8, 10, 12])],
      dtype=object)

For list and strings, these operations are defined as 'join/replication'. It's in effect doing [x.__add__(x) for x in arr]. where __add__ is the class specific operation.

np.exp doesn't work because it tries to do [x.exp() for in arr], and almost noone defines an exp method.

In [11]: np.exp(arr)
AttributeError: 'int' object has no attribute 'exp'

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<ipython-input-11-16c1c90aa297>", line 1, in <module>
    np.exp(arr)
TypeError: loop of ufunc does not support argument 0 of type int which has no callable exp method
hpaulj
  • 221,503
  • 14
  • 230
  • 353