A recarray is an array of records. Each record can have multiple fields. A record is sort of like a struct in C.
If the shape of the recarray is (133433,)
then the recarray is a 1-dimensional
array of records.
The fields of the recarray may be accessed by name-based
indexing. For example, csv['nsub']
and is essentially equivalent to
np.array([record['nsub'] for record in csv])
This special name-based indexing supports the illusion that a 1-dimensional recarray is a 2-dimensional array -- csv[intval]
selects rows, csv[fieldname]
selects "columns". However, under the hood and strictly
speaking if the shape is (133433,)
then it is 1-dimensional.
Note that not all recarrays are 1-dimensional.
It is possible to have a higher-dimensional recarray,
In [142]: arr = np.zeros((3,2), dtype=[('foo', 'int'), ('bar', 'float')])
In [143]: arr
Out[143]:
array([[(0, 0.0), (0, 0.0)],
[(0, 0.0), (0, 0.0)],
[(0, 0.0), (0, 0.0)]],
dtype=[('foo', '<i8'), ('bar', '<f8')])
In [144]: arr.shape
Out[144]: (3, 2)
This is a 2-dimensional array, whose elements are records.
Here are the bar
field values in the arr[:, 0]
slice:
In [148]: arr[:, 0]['bar']
Out[148]: array([ 0., 0., 0.])
Here are all the bar
field values in the 2D array:
In [151]: arr['bar']
Out[151]:
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
In [160]: arr['bar'].all()
Out[160]: False
Note that an alternative to using recarrays is Pandas Dataframes.
There are a lot more methods available for manipulating Dataframes than recarrays. You might find it more convenient.