A basic structured array gives you something that can be indexed with one name:
In [276]: dt=np.dtype([('A',int),('B',int),('C',int)])
In [277]: x=np.arange(9).reshape(3,3).view(dtype=dt)
In [278]: x
Out[278]:
array([[(0, 1, 2)],
[(3, 4, 5)],
[(6, 7, 8)]],
dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
In [279]: x['B'] # index by field name
Out[279]:
array([[1],
[4],
[7]])
In [280]: x[1] # index by row (array element)
Out[280]:
array([(3, 4, 5)],
dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
In [281]: x['B'][1]
Out[281]: array([4])
In [282]: x.shape # could be reshaped to (3,)
Out[282]: (3, 1)
The view approach produced a 2d array, but with just one column. The usual columns are replaced by dtype fields. It's 2d but with a twist. By using view
the data buffer is unchanged; the dtype just provides a different way of accessing those 'columns'. dtype
fields are, technically, not a dimension. They don't register in either the .shape
or .ndim
of the array. Also you can't use x[0,'A']
.
recarray
does the same thing, but adds the option of accessing fields as attributes, e.g. x.B
is the same as x['B']
.
rows
still have to be accessed by index number.
Another way of constructing a structured array is to defined values as a list of tuples.
In [283]: x1 = np.arange(9).reshape(3,3)
In [284]: x2=np.array([tuple(i) for i in x1],dtype=dt)
In [285]: x2
Out[285]:
array([(0, 1, 2), (3, 4, 5), (6, 7, 8)],
dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
In [286]: x2.shape
Out[286]: (3,)
ones
, zeros
, empty
also construct basic structured arrays
In [287]: np.ones((3,),dtype=dt)
Out[287]:
array([(1, 1, 1), (1, 1, 1), (1, 1, 1)],
dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
I can construct an array that is indexed with 2 field names, by nesting dtypes:
In [294]: dt1=np.dtype([('D',int),('E',int),('F',int)])
In [295]: dt2=np.dtype([('A',dt1),('B',dt1),('C',dt1)])
In [296]: y=np.ones((),dtype=dt2)
In [297]: y
Out[297]:
array(((1, 1, 1), (1, 1, 1), (1, 1, 1)),
dtype=[('A', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')]), ('B', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')]), ('C', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')])])
In [298]: y['A']['F']
Out[298]: array(1)
But frankly this is rather convoluted. I haven't even figured out how to set the elements to arange(9)
(without iterating over field names).
Structured arrays are most commonly produced by reading csv
files with np.genfromtxt
(or loadtxt
). The result is a named field for each labeled column, and a numbered 'row' for each line in the file.