3

I wish to manipulate named numpy arrays (add, multiply, concatenate, ...)

I defined structured arrays:

types=[('name1', int), ('name2', float)]
a = np.array([2, 3.3], dtype=types)
b = np.array([4, 5.35], dtype=types)

a and b are created such that

a
array([(2, 2. ), (3, 3.3)], dtype=[('name1', '<i8'), ('name2', '<f8')])

but I really want a['name1'] to be just 2, not array([2, 3])

Similarly, I want a['name2'] to be just 3.3

This way I could sum c=a+b, which is expected to be an array of length 2, where c['name1'] is 6 and c['name2'] is 8.65

How can I do that?

anishtain4
  • 2,342
  • 2
  • 17
  • 21
Amitai
  • 871
  • 1
  • 8
  • 21
  • There is a possibilty to do this in pandas. If you're interested in the pandas solution let me know. Though, I don't know if it's somehow possible in numpy – pythonic833 Jun 19 '18 at 14:59
  • @pythonic833 I was working with pandas DataFrame but it was too slow. The larger system consideration is out of scope for stackoverflow and I might have made a wrong design that pushed me into checking the option of numpy named arrays. Please do tell me what you had in mind with Pandas, it could very well be that I missed it out. – Amitai Jun 19 '18 at 15:10
  • 1
    `numpy` does not do math with whole structured arrays. Given the generality of a structured dtype that doesn't have have a clear definition. For example some fields might be strings. You can work field by field, e.g. `a['name1']+b['name1']`. – hpaulj Jun 19 '18 at 16:19
  • @hpaulj, in my real use-case I actually have strings in other array positions (as well as int and float as in my simplified example). I even have my own classes in other positions. What is common to all these classes is that they have a well defined \__add\__ method, hence it could be expected to allow this mixture of types in different entries in the array. I think I'll go back to pandas though, as suggested by anishtain4 and probably by pythonic883 too. Thank you all. – Amitai Jun 20 '18 at 05:52

2 Answers2

4

Define a structured array:

In [125]: dt = np.dtype([('f0','U10'),('f1',int),('f2',float)])
In [126]: a = np.array([('one',2,3),('two',4,5.5),('three',6,7)],dt)
In [127]: a
Out[127]: 
array([('one', 2, 3. ), ('two', 4, 5.5), ('three', 6, 7. )],
      dtype=[('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')])

And an object dtype array with the same data

In [128]: A = np.array([('one',2,3),('two',4,5.5),('three',6,7)],object)
In [129]: A
Out[129]: 
array([['one', 2, 3],
       ['two', 4, 5.5],
       ['three', 6, 7]], dtype=object)

Addition works because it (iteratively) delegates the action to all elements

In [130]: A+A
Out[130]: 
array([['oneone', 4, 6],
       ['twotwo', 8, 11.0],
       ['threethree', 12, 14]], dtype=object)

structured addition does not work

In [131]: a+a
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-131-6ff992d1ddd5> in <module>()
----> 1 a+a

TypeError: ufunc 'add' did not contain a loop with signature matching types 
dtype([('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')]) dtype([('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')]) 
dtype([('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')])

Lets try addition field by field:

In [132]: aa = np.zeros_like(a)
In [133]: for n in a.dtype.names: aa[n] = a[n] + a[n]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-133-68476e5d579e> in <module>()
----> 1 for n in a.dtype.names: aa[n] = a[n] + a[n]

TypeError: ufunc 'add' did not contain a loop with signature matching types 
dtype('<U10') dtype('<U10') dtype('<U10')

Oops, doesn't quite work - string dtype doesn't have addition. But we can handle the string field separately:

In [134]: aa['f0'] = a['f0']
In [135]: for n in a.dtype.names[1:]: aa[n] = a[n] + a[n]
In [136]: aa
Out[136]: 
array([('one',  4,  6.), ('two',  8, 11.), ('three', 12, 14.)],
      dtype=[('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')])

Or we can change the string dtype to object:

In [137]: dt1 = np.dtype([('f0',object),('f1',int),('f2',float)])
In [138]: b = np.array([('one',2,3),('two',4,5.5),('three',6,7)],dt1)
In [139]: b
Out[139]: 
array([('one', 2, 3. ), ('two', 4, 5.5), ('three', 6, 7. )],
      dtype=[('f0', 'O'), ('f1', '<i8'), ('f2', '<f8')])
In [140]: bb = np.zeros_like(b)
In [141]: for n in a.dtype.names: bb[n] = b[n] + b[n]
In [142]: bb
Out[142]: 
array([('oneone',  4,  6.), ('twotwo',  8, 11.), ('threethree', 12, 14.)],
      dtype=[('f0', 'O'), ('f1', '<i8'), ('f2', '<f8')])

Python strings do have a __add__, defined as concatenate. Numpy dtype strings don't have that definition. Python strings can be multiplied by an integer, but raise an error otherwise.

My guess is that pandas resorts to something like what I just did. I doubt if it implements dataframe addition in compiled code (except in some special cases). It probably works column by column if the dtype allows. It also seems to freely switch to object dtype (for example a column with both np.nan and a string). Timings might confirm my guess (I don't have pandas installed on this OS).

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • what a great detailed answer. Thanks a lot. I wasn't aware of the possibility to set dtype parameter of np.array as 'object'. I just hope it won't be as slow as pandas alternative (which is efficient for columnar big tables but not as much for MANY small 1-row objects) – Amitai Jun 20 '18 at 08:01
3

According to the documentation, the right way to make your arrays is:

types=[('name1', int), ('name2', float)]
a = np.array([(2, 3.3)], dtype=types)
b = np.array([(4, 5.35)], dtype=types)

Which gives generates a and b as you want them:

a['name1']
array([2])

But summing them is not as straight forward as the conventional numpy arrays, so I also suggest to use pandas:

names=['name1','name2']
a=pd.Series([2,3.3],index=names)
b=pd.Series([4,5.35],index=names)
a+b
name1    6.00
name2    8.65
dtype: float64
anishtain4
  • 2,342
  • 2
  • 17
  • 21