2

I am trying to append an array to a numpy recarray using the numpy.lib.recfunctions append_fields function.

I receive a "TypeError: data type not understood" error if the recarray field names are unicode.

Is this behaviour as-designed, and if so, is there a work-around?

Using python 2.7 on a Win32 computer.

The code below demonstrates the problem (forgive the additional date-time manipulations - I wanted to create the problem as closely as possible to my original fault):

from datetime import datetime
import numpy as np
import numpy.lib.recfunctions as RF

x1=np.array([1,2,3,4])
x2=np.array(['a','dd','xyz','12'])
x3=np.array([1.1,2,3,4])

# this works
FileData = np.core.records.fromarrays([x1,x2,x3],names=['a','b','c'])
# this doesnt
#FileData = np.core.records.fromarrays([x1,x2,x3],names=[u'a',u'b',u'c'])

sDT = ['1/1/2000 12:00:00','1/1/2000 13:00:00','1/1/2000     14:00:00','1/1/2000 15:00:00']
pDT = [datetime.strptime(x, '%d/%m/%Y %H:%M:%S') for x in sDT]

# convert to unix timestamp
DT = [ (np.datetime64(dt) - np.datetime64('1970-01-01T00:00:00Z')) /     np.timedelta64(1, 's')  for dt in pDT]

# add datetime to recaray
print FileData.dtype
FileData = RF.append_fields(FileData,'DateTime', data=DT)  #, dtypes='f8'
print FileData.dtype
RJCL
  • 357
  • 3
  • 14
  • Digging further I find that both py2 and py3 expect the field names to be strings (as defined by the version). It's the `fromarrays()` function that is the anomaly, by allowing unicode names in py2. – hpaulj Feb 19 '15 at 06:23

1 Answers1

2

Your code, with the unicode names runs fine under Python3, where unicode is the default string type.


Edit: Initially I thought the problem lay with masked arrays. But with further testing, I've concluded that the real issue is whether dtype can accept unicode (in the py2 case) names or not. Normally the names must be strings (however the version defines them). But the dictionary style of defining a dtype allows unicode names. This is the root of why your fromarrays works, but not the append_fields.

I'm leaving the masked array discussion in place, since that is what you accepted.

Is it a problem with masked arrays?

Under the 2.7, the full error stack is:

  File "stack28586238.py", line 22, in <module>
    FileData = RF.append_fields(FileData,'DateTime', data=DT)  #, dtypes='f8'
  File "/usr/local/lib/python2.7/site-packages/numpy/lib/recfunctions.py", line 633, in append_fields
    base = merge_arrays(base, usemask=usemask, fill_value=fill_value)
  File "/usr/local/lib/python2.7/site-packages/numpy/lib/recfunctions.py", line 403, in merge_arrays
    return seqarrays.view(dtype=seqdtype, type=seqtype)
  File "/usr/local/lib/python2.7/site-packages/numpy/core/records.py", line 501, in view
    return ndarray.view(self, dtype, type)
  File "/usr/local/lib/python2.7/site-packages/numpy/ma/core.py", line 2782, in __array_finalize__
    _mask = getattr(obj, '_mask', make_mask_none(obj.shape, odtype))
  File "/usr/local/lib/python2.7/site-packages/numpy/ma/core.py", line 1566, in make_mask_none
    result = np.zeros(newshape, dtype=make_mask_descr(dtype))
  File "/usr/local/lib/python2.7/site-packages/numpy/ma/core.py", line 1242, in make_mask_descr
    return np.dtype(_recursive_make_descr(ndtype, np.bool))
TypeError: data type not understood

The error occurs in a masked array function, a call like:

np.ma.make_mask_none((3,),dtype=[(u'value','f4')])

I came across a problem with masked arrays in a previous SO question (not too long ago). I'll have to see if it's related.


Is the mask of a structured array supposed to be structured itself?

isn't directly related, but it does point to some faults when mixing masked arrays and structured array.

Adding datetime field to recarray is your earlier SO question regarding add_fields focusing on the datetime type.

Or a problem with defining a dtype with unicode names?

By adding usemask=False I shift the error to another point, where it is trying to construct a dtype from the two component dtype lists: np.dtype(base.dtype.descr + data.dtype.descr).

In 2.7, we can construct a record array with unicode names:

In [11]: np.core.records.fromarrays([[0,1]],names=[u'test'])
Out[11]: 
rec.array([(0,), (1,)], 
      dtype=[(u'test', '<i4')])

But I can't construct a dtype directly with unicode name:

In [12]: np.dtype([(u'test', int)])
...    
TypeError: data type not understood

It appears that normally the dtype names must be strings. In python3, np.dtype([(b'test', int)]) produces the same error.

Also in py3

np.core.records.fromarrays([[0,1]],names=[b'test'])

produces a ValueError: field names must be strings.

np.core.records.fromarrays allows unicode because it uses format_parser:

p=np.format_parser(['int'],[u'test'],[])
p.dtype
# dtype([(u'test', '<i4')])

This works because the dictionary style of dtype definition accepts unicode:

np.dtype({'names':[u'test'],'formats':['int']})

Here's an example of merging two structured arrays with unicode names (working in py2):

In [53]: dt1 = np.dtype({'names':[u'test1'],'formats':['int']})
In [54]: dt2 = np.dtype({'names':[u'test2'],'formats':['float']})
In [55]: dt12 = np.dtype({'names':[u'test1',u'test2'],'formats':['int','float']})

In [56]: arr1 = np.array([(x,) for x in [1,2,3]],dtype=dt1)
In [57]: arr2 = np.array([(x,) for x in [1,2,3]],dtype=dt2)
In [58]: arr12 = np.zeros((3,),dtype=dt12)

In [59]: RF.recursive_fill_fields(arr1,arr12)
...

In [60]: RF.recursive_fill_fields(arr2,arr12)
Out[60]: 
array([(1, 1.0), (2, 2.0), (3, 3.0)], 
      dtype=[(u'test1', '<i4'), (u'test2', '<f8')])

append_fields does essentially this (with a few added bells and whistles).


I've added a comment to https://github.com/numpy/numpy/issues/2407 dtype field names cannot be unicode (Trac #1814)

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353