11

Is it possible to create a NumPy object that behaves very much like a collections.namedtuple, in the sense that elements can be accessed like so:

data[1] = 42
data['start date'] = '2011-09-20'  # Slight generalization of what is possible with a namedtuple

I tried to use a complex data type:

>>> data = numpy.empty(shape=tuple(), dtype=[('start date', 'S11'), ('n', int)])

This creates a 0-dimensional value with a kind of namedtuple type; it almost works:

>>> data['start date'] = '2011-09-20'
>>> data
array(('2011-09-20', -3241474627884561860), 
      dtype=[('start date', '|S11'), ('n', '<i8')])

However, element access does not work, because the "array" is 0-dimensional:

>>> data[0] = '2011-09-20'
Traceback (most recent call last):
  File "<ipython-input-19-ed41131430b9>", line 1, in <module>
    data[0] = '2011-09-20'
IndexError: 0-d arrays can't be indexed.

Is there a way of obtaining the desired behavior described above (item assignment through both a string and an index) with a NumPy object?

Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260

4 Answers4

3

You can do something like this using the numpy.rec module. What you need is the record class from this module, but I don't know how to directly create an instance of such a class. One indrect way is to first create a recarray with a single entry:

>>> a = numpy.recarray(1, names=["start date", "n"], formats=["S11", "i4"])[0]
>>> a[0] = "2011-09-20"
>>> a[1] = 42
>>> a
('2011-09-20', 42)
>>> a["start date"]
'2011-09-20'
>>> a.n
42

If you figure out how to create an instance of record directly, please let me know.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
2

OK, I found a solution, but I would love to see a more elegant one:

data = numpy.empty(shape=1, dtype=[('start date', 'S11'), ('n', int)])[0]

creates a 1-dimensional array with a single element and gets the element. This makes accessing elements work with both strings and numerical indices:

>>> data['start date'] = '2011-09-20'  # Contains a space: more flexible than a namedtuple!
>>> data[1] = 123
>>> data
('2011-09-20', 123)

It would be nice if there was a way of directly constructing data, without having to first create an array with one element and extracting this element. Since

>>> type(data)
<type 'numpy.void'>

I'm not sure what NumPy constructor could be called… (there is no docstring for numpy.void).

Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
  • 1
    Beat me to it! I won't delete my answer because it is slightly different: it also allows to access the integer member as `data.n`. – Sven Marnach Sep 20 '11 at 19:10
  • @Sven Marnach: Yeah, we had the same idea. Having attribute access in addition might be useful. Like you, I'm still curious about the *direct* creation with NumPy of an object that behaves like `data` (or your `a`). – Eric O. Lebigot Sep 20 '11 at 19:47
  • I suspect the `record` class itself does not have any facilities for allocating the necessary memory, and you need to allocate the memory by some other mechanism. I guess it simply doesn't get any better than this, though I did not investigate this any further. – Sven Marnach Sep 20 '11 at 20:59
  • @SvenMarnach: The natural solution would have been to create a scalar (i.e. 0-dimensional) record, like in the original question, but NumPy understandably (and also unfortunately) does not like numerical indexing, in this case… – Eric O. Lebigot Sep 21 '11 at 15:29
2

This is nicely implemented by "Series" in the Pandas package.

For example from the tutorial:

>>> from pandas import *
>>> import numpy as np
>>> s = Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
>>> s
a    -0.125628696947
b    0.0942011098937
c    -0.71375003803
d    -0.590085433392
e    0.993157363933
>>> s[1]
0.094201109893723267
>>> s['b']
0.094201109893723267

I've just been playing around with this for a few days, but it looks like it has a lot to offer.

tom10
  • 67,082
  • 10
  • 127
  • 137
  • 2
    Pandas looks interesting. However, the question is really about how to achieve the desired result in *NumPy*; the reason for this question was that NumPy appeared almost capable of doing what was wanted (other answers show that NumPy can indeed do it). – Eric O. Lebigot Sep 21 '11 at 15:27
  • 1
    It is implemented in NumPy, it's just been done by someone else and put together with some other tools and collected in a package called Pandas. But it's an interesting question *how* to do it yourself in NumPy too. – tom10 Sep 21 '11 at 16:05
2

(edited as EOL's recommended to be more specific in answering the question.)

create 0-dim array (I didn't find a scalar constructor either.)

>>> data0 = np.array(('2011-09-20', 0), dtype=[('start date', 'S11'), ('n', int)])
>>> data0.ndim
0

access element in 0-dim array

>>> type(data0[()])
<class 'numpy.void'>
>>> data0[()][0]
b'2011-09-20'
>>> data0[()]['start date']
b'2011-09-20'

>>> #There is also an item() method, which however returns the element as python type
>>> type(data0.item())
<class 'tuple'>

I think the easiest is to think of structured arrays (or recarrays) as list or arrays of tuples, and indexing works by name which selects column and by integers which selects rows.

>>> tupleli = [('2011-09-2%s' % i, i) for i in range(5)]
>>> tupleli
[('2011-09-20', 0), ('2011-09-21', 1), ('2011-09-22', 2), ('2011-09-23', 3), ('2011-09-24', 4)]
>>> dt = dtype=[('start date', '|S11'), ('n', np.int64)]
>>> dt
[('start date', '|S11'), ('n', <class 'numpy.int64'>)]

zero dimensional array, element is tuple, i.e. one record, changed: is not a scalar element, see at end

>>> data1 = np.array(tupleli[0], dtype=dt)
>>> data1.shape
()
>>> data1['start date']
array(b'2011-09-20', 
      dtype='|S11')
>>> data1['n']
array(0, dtype=int64)

array with one element

>>> data2 = np.array([tupleli[0]], dtype=dt)
>>> data2.shape
(1,)
>>> data2[0]
(b'2011-09-20', 0)

1d array

>>> data3 = np.array(tupleli, dtype=dt)
>>> data3.shape
(5,)
>>> data3[2]
(b'2011-09-22', 2)
>>> data3['start date']
array([b'2011-09-20', b'2011-09-21', b'2011-09-22', b'2011-09-23',
       b'2011-09-24'], 
      dtype='|S11')
>>> data3['n']
array([0, 1, 2, 3, 4], dtype=int64)

direct indexing into a single record, same as in EOL's example that I didn't know it works

>>> data3[2][1]
2
>>> data3[2][0]
b'2011-09-22'

>>> data3[2]['n']
2
>>> data3[2]['start date']
b'2011-09-22'

trying to understand EOL's example: scalar element and zero-dimensional array are different

>>> type(data1)
<class 'numpy.ndarray'>
>>> type(data1[()])   #get element out of 0-dim array
<class 'numpy.void'>

>>> data1[0]
Traceback (most recent call last):
  File "<pyshell#98>", line 1, in <module>
    data1[0]
IndexError: 0-d arrays can't be indexed
>>> data1[()][0]
b'2011-09-20'

>>> data1.ndim
0
>>> data1[()].ndim
0

(Note: I typed the example in an open python 3.2 interpreter by accident, so there is a b'...')

Josef
  • 21,998
  • 3
  • 54
  • 67
  • +1 because of the `data1[()]` indexing of 0-dimensional arrays, which I did not know about. I agree with all the rest. I would like to suggest that you put the final example on top of the text, because it is really the part which is an answer to the question. It's a good answer: there is no need to create a 1-dimensional array, so it's a little more elegant than my answer, or Sven's. (Still curious about *directly* creating a kind a named tuple within NumPy, though…) – Eric O. Lebigot Sep 23 '11 at 09:36
  • I looked around in the numpy (1.5) documentation and didn't find a constructor that would create a scalar element with structured dtype. – Josef Sep 23 '11 at 13:29
  • Just as an aside: I think this is an unusual question. I never needed the scalar constructors in numpy, mainly because of the overhead since numpy is designed for arrays, but also because it can lead to unpleasant surprises when the behavior between the basic python and numpy types differs and I don't want to make always sure that I have the right type. – Josef Sep 23 '11 at 13:41
  • I never needed NumPy scalar constructors either before. :) The thing is that I start from a NumPy structure/record array whose fields have names that contain spaces. This precludes the transfer of array lines to the standard Python named tuple type. However, I am in a case where calculations take data from the structure array and have to put the results in a single record that has the same fields; these calculations are clearer if field names are used in the code, hence the question. – Eric O. Lebigot Sep 23 '11 at 15:38