0

I am interested in creating a 2d numpy array where each row can be referred to by a unique field name. I have experimented with record arrays like such:

>>> a = np.recarray((2, 10), dtype=[('x', 'f8'), ('y', 'f8')])

But this fails when I do simple arithmetic like so:

>>> a += 4.0
TypeError: invalid type promotion

Is there a way to use named fields that doesn't require setting different data types for each field, or that won't fail when I try to do array math?

triphook
  • 2,915
  • 3
  • 25
  • 34

3 Answers3

2

recarray allows access to fields like an object attribute. So this works.

a.x += 4
a.y += 5
Hun
  • 3,707
  • 2
  • 15
  • 15
2

np.recarray((2, 10), dtype=[('x', 'f8'), ('y', 'f8')]) returns a matrix of size 2x10 where each element is a tuple of two elements. Thus, the operation a+4.0 has no meaning at all.

You have to access each field of a recarray independently:

a[0,0].x += 4.0
Roman Kh
  • 2,708
  • 2
  • 18
  • 16
2

recarray and structured arrays are not designed to be convenient ways of naming columns. They are meant to hold a mix of data types, the kind of thing you might load from a CSV file - strings, integers, floats, dates.

The operations that can be performed across fields are limited. As you found, you cannot add a value to the whole array. You have to add it field by field - provided the field type is right. Similarly you can't sum the 2 fields, or take their mean (with np.sum or np.mean functions). Also can't reshape or transpose those arrays (exchanging fields for rows, etc).

Constructing np.array with overlapping fields in dtype is a current SO question that illustrates several ways of accessing a couple of fields as a 2 column array.

It is better to stick with normal nd arrays unless you really need the added flexibility of a structured array. If you want to access columns by name, consider defining variables, e.g. ind_x=0, ind_y=1, so you can use a[2:5, ind_x].

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353