Your code, with the unicode
names runs fine under Python3
, where unicode is the default string type.
Edit: Initially I thought the problem lay with masked arrays. But with further testing, I've concluded that the real issue is whether dtype
can accept unicode (in the py2 case) names or not. Normally the names must be strings (however the version defines them). But the dictionary style of defining a dtype allows unicode names. This is the root of why your fromarrays
works, but not the append_fields
.
I'm leaving the masked array discussion in place, since that is what you accepted.
Is it a problem with masked arrays?
Under the 2.7, the full error stack is:
File "stack28586238.py", line 22, in <module>
FileData = RF.append_fields(FileData,'DateTime', data=DT) #, dtypes='f8'
File "/usr/local/lib/python2.7/site-packages/numpy/lib/recfunctions.py", line 633, in append_fields
base = merge_arrays(base, usemask=usemask, fill_value=fill_value)
File "/usr/local/lib/python2.7/site-packages/numpy/lib/recfunctions.py", line 403, in merge_arrays
return seqarrays.view(dtype=seqdtype, type=seqtype)
File "/usr/local/lib/python2.7/site-packages/numpy/core/records.py", line 501, in view
return ndarray.view(self, dtype, type)
File "/usr/local/lib/python2.7/site-packages/numpy/ma/core.py", line 2782, in __array_finalize__
_mask = getattr(obj, '_mask', make_mask_none(obj.shape, odtype))
File "/usr/local/lib/python2.7/site-packages/numpy/ma/core.py", line 1566, in make_mask_none
result = np.zeros(newshape, dtype=make_mask_descr(dtype))
File "/usr/local/lib/python2.7/site-packages/numpy/ma/core.py", line 1242, in make_mask_descr
return np.dtype(_recursive_make_descr(ndtype, np.bool))
TypeError: data type not understood
The error occurs in a masked array function, a call like:
np.ma.make_mask_none((3,),dtype=[(u'value','f4')])
I came across a problem with masked arrays in a previous SO question (not too long ago). I'll have to see if it's related.
Is the mask of a structured array supposed to be structured itself?
isn't directly related, but it does point to some faults when mixing masked arrays and structured array.
Adding datetime field to recarray
is your earlier SO question regarding add_fields
focusing on the datetime
type.
Or a problem with defining a dtype with unicode names?
By adding usemask=False
I shift the error to another point, where it is trying to construct a dtype
from the two component dtype lists: np.dtype(base.dtype.descr + data.dtype.descr)
.
In 2.7, we can construct a record array
with unicode names:
In [11]: np.core.records.fromarrays([[0,1]],names=[u'test'])
Out[11]:
rec.array([(0,), (1,)],
dtype=[(u'test', '<i4')])
But I can't construct a dtype
directly with unicode name:
In [12]: np.dtype([(u'test', int)])
...
TypeError: data type not understood
It appears that normally the dtype names must be strings. In python3
, np.dtype([(b'test', int)])
produces the same error.
Also in py3
np.core.records.fromarrays([[0,1]],names=[b'test'])
produces a ValueError: field names must be strings
.
np.core.records.fromarrays
allows unicode because it uses format_parser
:
p=np.format_parser(['int'],[u'test'],[])
p.dtype
# dtype([(u'test', '<i4')])
This works because the dictionary style of dtype definition accepts unicode:
np.dtype({'names':[u'test'],'formats':['int']})
Here's an example of merging two structured arrays with unicode names (working in py2):
In [53]: dt1 = np.dtype({'names':[u'test1'],'formats':['int']})
In [54]: dt2 = np.dtype({'names':[u'test2'],'formats':['float']})
In [55]: dt12 = np.dtype({'names':[u'test1',u'test2'],'formats':['int','float']})
In [56]: arr1 = np.array([(x,) for x in [1,2,3]],dtype=dt1)
In [57]: arr2 = np.array([(x,) for x in [1,2,3]],dtype=dt2)
In [58]: arr12 = np.zeros((3,),dtype=dt12)
In [59]: RF.recursive_fill_fields(arr1,arr12)
...
In [60]: RF.recursive_fill_fields(arr2,arr12)
Out[60]:
array([(1, 1.0), (2, 2.0), (3, 3.0)],
dtype=[(u'test1', '<i4'), (u'test2', '<f8')])
append_fields
does essentially this (with a few added bells and whistles).
I've added a comment to https://github.com/numpy/numpy/issues/2407
dtype field names cannot be unicode (Trac #1814)