1

I need to add a column of data to a numpy rec array. I have seen many answers floating around here, but they do not seem to work for a rec array that only contains one row...

Let's say I have a rec array x:

>>> x = np.rec.array([1, 2, 3])
>>> print(x)
rec.array((1, 2, 3), 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])

and I want to append the value 4 to a new column with it's own field name and data type, such as

 rec.array((1, 2, 3, 4), 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8')])

If I try to add a column using the normal append_fields approach;

>>> np.lib.recfunctions.append_fields(x, 'f3', 4, dtypes='<i8', 
usemask=False, asrecarray=True)

then I ultimately end up with

TypeError: len() of unsized object

It turns out that for a rec array with only one row, len(x) does not work, while x.size does. If I instead use np.hstack(), I get TypeError: invalid type promotion, and if I try np.c_, I get an undesired result

>>> np.c_[x, 4]
array([[(1, 2, 3), (4, 4, 4)]], 
  dtype=(numpy.record, [('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')]))
pretzlstyle
  • 2,774
  • 5
  • 23
  • 40
  • The problem isn't with the dimension of `x`, but of `4`. `rf.append_fields(x, 'f3',[4], usemask=False)` – hpaulj May 18 '17 at 18:09
  • @hpaulj Did you actually try this? It makes no difference. I still get a `TypeError`. It doesn't even make sense that the solution would be to change the `4`... if the source code of the numpy function calls `len(x)` and `len(x)` throws an error, then its as simple as that. – pretzlstyle May 18 '17 at 18:12
  • 1
    Sorry, my `x` was `(1,)`. `x = np.rec.array([(1, 2, 3)]) `, My general impression is that the `recfunctions` are buggy and are not actively developed. More than once I've recommended working with structured arrays directly. – hpaulj May 18 '17 at 18:27
  • I agree with @hpaulj; it is probably a bug that `append_fields` doesn't work with a "scalar" recarray (i.e. an array with shape `()`). – Warren Weckesser May 18 '17 at 18:33

2 Answers2

2

Create the initial array so that it has shape (1,); note the extra brackets:

In [17]: x = np.rec.array([[1, 2, 3]])

(If x is an input that you can't control that way, you could use x = np.atleast_1d(x) before using it in append_fields().)

Then make sure the value given in append_fields is also a sequence of length 1:

In [18]: np.lib.recfunctions.append_fields(x, 'f3', [4], dtypes='<i8', 
    ...: usemask=False, asrecarray=True)
Out[18]: 
rec.array([(1, 2, 3, 4)], 
          dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8')])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
1

Here's a way of doing the job without a recfunctions:

In [64]: x = np.rec.array((1, 2, 3))
In [65]: y=np.zeros(x.shape, dtype=x.dtype.descr+[('f3','<i4')])
In [66]: y
Out[66]: 
array((0, 0, 0, 0), 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')])
In [67]: for name in x.dtype.names: y[name] = x[name]
In [68]: y['f3']=4
In [69]: y
Out[69]: 
array((1, 2, 3, 4), 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')])

From what I've seen in recfunctions code, I think it's just as fast. Of course for a single row speed isn't an issue. In general those functions create a new 'blank' array with the target dtype, and copy fields, by name (possibly recursively) from sources to target. Usually an array has many more records than fields, so iteration on fields is not, relatively speaking, slow.

hpaulj
  • 221,503
  • 14
  • 230
  • 353