Naming fields in NumPy Arrays

Question

I am interested in creating a 2d numpy array where each row can be referred to by a unique field name. I have experimented with record arrays like such:

>>> a = np.recarray((2, 10), dtype=[('x', 'f8'), ('y', 'f8')])

But this fails when I do simple arithmetic like so:

>>> a += 4.0
TypeError: invalid type promotion

Is there a way to use named fields that doesn't require setting different data types for each field, or that won't fail when I try to do array math?

score 2 · Answer 1 · answered Apr 05 '16 at 18:51

2

recarray allows access to fields like an object attribute. So this works.

a.x += 4
a.y += 5

answered Apr 05 '16 at 18:51

Hun

3,707
2
15
15

Roman Kh · Answer 2 · 2016-04-05T19:57:22.640

2

np.recarray((2, 10), dtype=[('x', 'f8'), ('y', 'f8')]) returns a matrix of size 2x10 where each element is a tuple of two elements. Thus, the operation a+4.0 has no meaning at all.

You have to access each field of a recarray independently:

a[0,0].x += 4.0

edited Apr 05 '16 at 19:57

answered Apr 05 '16 at 18:54

Roman Kh

2,708
2
18
16

1

`a[0,0].x` is the preferred way of indexing a 2d array (rec or not). – hpaulj Apr 05 '16 at 19:54

score 2 · Accepted Answer · edited May 23 '17 at 11:59

recarray and structured arrays are not designed to be convenient ways of naming columns. They are meant to hold a mix of data types, the kind of thing you might load from a CSV file - strings, integers, floats, dates.

The operations that can be performed across fields are limited. As you found, you cannot add a value to the whole array. You have to add it field by field - provided the field type is right. Similarly you can't sum the 2 fields, or take their mean (with np.sum or np.mean functions). Also can't reshape or transpose those arrays (exchanging fields for rows, etc).

Constructing np.array with overlapping fields in dtype is a current SO question that illustrates several ways of accessing a couple of fields as a 2 column array.

It is better to stick with normal nd arrays unless you really need the added flexibility of a structured array. If you want to access columns by name, consider defining variables, e.g. ind_x=0, ind_y=1, so you can use a[2:5, ind_x].

Naming fields in NumPy Arrays

3 Answers3