As noted, multifield selection is in a state of flux. I recently up dated to 1.14.2, and behavior is back to what it was before 1.14.0.
In [114]: data = np.array([(1.0, 2.0, 0), (3.0, 4.0, 1)],
...: dtype=[('feature_1', float), ('feature_2', float), ('resul
...: t', int)])
...:
In [115]: data
Out[115]:
array([(1., 2., 0), (3., 4., 1)],
dtype=[('feature_1', '<f8'), ('feature_2', '<f8'), ('result', '<i8')])
In [116]: features = data[['feature_1', 'feature_2']]
In [117]: features
Out[117]:
array([(1., 2.), (3., 4.)],
dtype=[('feature_1', '<f8'), ('feature_2', '<f8')])
(I'm omitting the extra layer of recarray
conversion.)
In 1.14.0 this dtype would include an offset
value, indicating that features
was a view, not a copy.
I can change values of features
without changing data
:
In [124]: features['feature_1']
Out[124]: array([1., 3.])
In [125]: features['feature_1'] = [4,5]
In [126]: features
Out[126]:
array([(4., 2.), (5., 4.)],
dtype=[('feature_1', '<f8'), ('feature_2', '<f8')])
In [127]: data
Out[127]:
array([(1., 2., 0), (3., 4., 1)],
dtype=[('feature_1', '<f8'), ('feature_2', '<f8'), ('result', '<i8')])
But without delving into the development discussion, I can't say what the long term solution will be. Ideally it should have both the ability to fetch a view
(which maintains a link to the original databuffer), and a copy, an array that is independent and freely modifiable.
I suspect the copy
version will follow a recfunctions
practice of constructing a new array with the new dtype, and then copying data field by field.
In [132]: data.dtype.descr
Out[132]: [('feature_1', '<f8'), ('feature_2', '<f8'), ('result', '<i8')]
In [133]: dt = data.dtype.descr[:-1]
In [134]: dt
Out[134]: [('feature_1', '<f8'), ('feature_2', '<f8')]
In [135]: arr = np.zeros(data.shape, dtype=dt)
In [136]: arr
Out[136]:
array([(0., 0.), (0., 0.)],
dtype=[('feature_1', '<f8'), ('feature_2', '<f8')])
In [137]: for name in arr.dtype.fields:
...: arr[name] = data[name]
...:
In [138]: arr
Out[138]:
array([(1., 2.), (3., 4.)],
dtype=[('feature_1', '<f8'), ('feature_2', '<f8')])
or another recfunctions function:
In [159]: rf.drop_fields(data, 'result')
Out[159]:
array([(1., 2.), (3., 4.)],
dtype=[('feature_1', '<f8'), ('feature_2', '<f8')])
recfunctions
has code that can copy complex dtypes, ones with nested dtypes and such. But for simple one-layered dtype like this, simple field name iteration is enough.
In general, structured arrays (and recarray) have many records, and a limited number of fields. So copying fields by name is relatively efficient.
In [150]: import numpy.lib.recfunctions as rf
In [154]: arr = np.zeros(data.shape, dtype=dt)
In [155]: rf.recursive_fill_fields(data, arr)
Out[155]:
array([(1., 2.), (3., 4.)],
dtype=[('feature_1', '<f8'), ('feature_2', '<f8')])
but note its code ends with:
output = np.empty(base.shape, dtype=newdtype)
output = recursive_fill_fields(base, output)
Development notes at some point alluded to a recfunctions.compress_fields
function, but that apparently was never actually added.