3

I am trying to construct a structured array in Python that can be accessed by the names of the columns and rows. Is this possible with the structured array method of numpy?

Example: My array should have roughly this form:

My_array =        A B C 
                E 1 2 3 
                F 4 5 6 
                G 7 8 9 

And i want to have the possibility to do the following:

My_array["A"]["E"] = 1
My_array["C"]["F"] = 6

Is it possible to do this in pyhton using structured arrays or is there another type of structure which is more suitable for such a task?

swot
  • 278
  • 1
  • 3
  • 10
  • 1
    You can use [pandas](http://pandas.pydata.org) – yangjie Jul 10 '15 at 09:42
  • @yangjie tanks, `pandas` look promissing. However, I have to pass these data through MPI interfaces `numpy` would be a good solution, since they are faster and easier to pass through the interfaces. – swot Jul 10 '15 at 10:06

2 Answers2

2

A basic structured array gives you something that can be indexed with one name:

In [276]: dt=np.dtype([('A',int),('B',int),('C',int)])
In [277]: x=np.arange(9).reshape(3,3).view(dtype=dt)
In [278]: x
Out[278]: 
array([[(0, 1, 2)],
       [(3, 4, 5)],
       [(6, 7, 8)]], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])

In [279]: x['B']   # index by field name
Out[279]: 
array([[1],
       [4],
       [7]])

In [280]: x[1]    # index by row (array element)
Out[280]: 
array([(3, 4, 5)], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])

In [281]: x['B'][1]
Out[281]: array([4])

In [282]: x.shape    # could be reshaped to (3,)
Out[282]: (3, 1)

The view approach produced a 2d array, but with just one column. The usual columns are replaced by dtype fields. It's 2d but with a twist. By using view the data buffer is unchanged; the dtype just provides a different way of accessing those 'columns'. dtype fields are, technically, not a dimension. They don't register in either the .shape or .ndim of the array. Also you can't use x[0,'A'].

recarray does the same thing, but adds the option of accessing fields as attributes, e.g. x.B is the same as x['B'].

rows still have to be accessed by index number.

Another way of constructing a structured array is to defined values as a list of tuples.

In [283]: x1 = np.arange(9).reshape(3,3)
In [284]: x2=np.array([tuple(i) for i in x1],dtype=dt)
In [285]: x2
Out[285]: 
array([(0, 1, 2), (3, 4, 5), (6, 7, 8)], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
In [286]: x2.shape
Out[286]: (3,)

ones, zeros, empty also construct basic structured arrays

In [287]: np.ones((3,),dtype=dt)
Out[287]: 
array([(1, 1, 1), (1, 1, 1), (1, 1, 1)], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])

I can construct an array that is indexed with 2 field names, by nesting dtypes:

In [294]: dt1=np.dtype([('D',int),('E',int),('F',int)])

In [295]: dt2=np.dtype([('A',dt1),('B',dt1),('C',dt1)])

In [296]: y=np.ones((),dtype=dt2)

In [297]: y
Out[297]: 
array(((1, 1, 1), (1, 1, 1), (1, 1, 1)), 
      dtype=[('A', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')]), ('B', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')]), ('C', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')])])

In [298]: y['A']['F']
Out[298]: array(1)

But frankly this is rather convoluted. I haven't even figured out how to set the elements to arange(9) (without iterating over field names).

Structured arrays are most commonly produced by reading csv files with np.genfromtxt (or loadtxt). The result is a named field for each labeled column, and a numbered 'row' for each line in the file.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • can't format this properly in a comment so bear with me. To carry on from your line 298 ...In [299]: z = y.view(type=np.recarray) then In [300]: z.A['F'] or In [3XX]: z.A.F yields array(1) recarrays just add the capability to simply arry['field'] with arry.field notation so z.A.F 'looks' better than the slice equivalent –  Jul 10 '15 at 19:56
  • Yes, my `dt2` dtype can be cast as a `recarray` and accessed with `x.A.F`. The more complex the dtype the better `recarray` looks. – hpaulj Jul 10 '15 at 22:47
  • @hpaulj Thanks for the discussion of the different possibilities. I was not aware of the possibility to nest `dtypes`. Maybe, I will tryout your last suggestion with the nested `dtypes` later. – swot Jul 13 '15 at 13:49
1

With a recarray, you can access columns with dot notation or with specific reference to the column name. For rows, they are accessed by row number. I haven't seen them accessed via a row name, for example:

>>> import numpy as np
>>> a = np.arange(1,10,1).reshape(3,3)
>>> dt = np.dtype([('A','int'),('B','int'),('C','int')])
>>> a.dtype = dt
>>> r = a.view(type=np.recarray)
>>> r
rec.array([[(1, 2, 3)],
       [(4, 5, 6)],
       [(7, 8, 9)]], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
>>> r.A
array([[1],
       [4],
       [7]])
>>> r['A']
array([[1],
       [4],
       [7]])
>>> r.A[0]
array([1])
>>> a['A'][0]
array([1])
>>> # now for the row
>>> >>> r[0]
rec.array([(1, 2, 3)], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
>>>

You can specify the dtype and the type at the same time

>>> a = np.ones((3,3))
>>> b = a.view(dtype= [('A','<f8'), ('B','<f8'),('C', '<f8')], type = np.recarray)
>>> b
rec.array([[(1.0, 1.0, 1.0)],
       [(1.0, 1.0, 1.0)],
       [(1.0, 1.0, 1.0)]], 
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
>>> b.A
array([[ 1.],
       [ 1.],
       [ 1.]])
>>> b.A[0]
array([ 1.])
  • Thanks for the suggestion. I think I can work around the indexing of the columns. However, if i use `np.zeros((3,3))` instead of arange and reshape, I get an array that has two triples in a row: `[(0.0, 0.0, 0.0), (0.0, 0.0, 0.0)]`. But I just want one triple. – swot Jul 10 '15 at 10:21
  • `dtype= [('A', np.float), ('B', np.float), ('C', np.float)]` solves the problem. Can anyone tell me why? – swot Jul 10 '15 at 11:20
  • 1
    swot I am adding the row thing to my original post –  Jul 10 '15 at 15:09