1

I can use numpy's vectorize function to create an array of objects of some arbitrary class:

import numpy as np

class Body:
    """
    Simple class to represent a point mass in 2D space, more to 
    play with numpy than anything else...
    """

    def __init__(self, position, mass, velocity):
        self.position = position
        self.mass     = mass
        self.velocity = velocity

    def __repr__(self):
        return "m = {} p = {} v = {}".format(self.mass, 
                self.position, self.velocity)

if __name__ == '__main__':

    positions  = np.array([0 + 0j, 1 + 1j, 2 + 0j])
    masses     = np.array([2,      5,      1])
    velocities = np.array([0 + 0j, 0 + 1j, 1 + 0j])

    vBody  = np.vectorize(Body)

    points = vBody(positions, masses, velocities)

Now, if I wanted to retrieve a vector containing (say) the velocities from the points array, I could just use an ordinary Python list comprehension

    v = [p.velocity for p in points]

But is there a numpy-thonic way to do it? On large arrays would this be more efficient than using a list comprehension?

TimGJ
  • 1,584
  • 2
  • 16
  • 32
  • The numpythonic way to do this is to *not use numpy*. – juanpa.arrivillaga Apr 23 '17 at 20:09
  • Or use [structs](https://docs.scipy.org/doc/numpy/user/basics.rec.html). But from what you are describing you just want a vanilla `list`. Because what you are doing is creating `object` dtype arrays, and those are basically less performant python lists, with none of the advantages of numpy arrays. – juanpa.arrivillaga Apr 23 '17 at 20:11
  • Not a good idea, but: `v = points[0].velocity.base` will get you back your original `velocities` array, usually – Eric Apr 23 '17 at 22:48

2 Answers2

4

So, I would encourage you not to use numpy arrays with an object dtype. However, what you have here is essentially a struct, so you could use numpy to your advantage using a structured array. So, first, create a dtype:

>>> import numpy as np
>>> bodytype = np.dtype([('position', np.complex), ('mass', np.float), ('velocity', np.complex)])

Then, initialize your body array:

>>> bodyarray = np.zeros((len(positions),), dtype=bodytype)
>>> bodyarray
array([(0j, 0.0, 0j), (0j, 0.0, 0j), (0j, 0.0, 0j)],
      dtype=[('position', '<c16'), ('mass', '<f8'), ('velocity', '<c16')])

Now, you can set your values easily:

>>> positions  = np.array([0 + 0j, 1 + 1j, 2 + 0j])
>>> masses     = np.array([2,      5,      1])
>>> velocities = np.array([0 + 0j, 0 + 1j, 1 + 0j])
>>> bodyarray['position'] = positions
>>> bodyarray['mass'] = masses
>>> bodyarray['velocity'] = velocities

And now you have an array of "bodies" that can take full advantage of numpy as well as letting you access "attributes" like this:

>>> bodyarray
array([(0j, 2.0, 0j), ((1+1j), 5.0, 1j), ((2+0j), 1.0, (1+0j))],
      dtype=[('position', '<c16'), ('mass', '<f8'), ('velocity', '<c16')])
>>> bodyarray['mass']
array([ 2.,  5.,  1.])
>>> bodyarray['velocity']
array([ 0.+0.j,  0.+1.j,  1.+0.j])
>>> bodyarray['position']
array([ 0.+0.j,  1.+1.j,  2.+0.j])
>>>

Note here,

>>> bodyarray.shape
(3,)
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • @TimGJ: This is a bit of a misuse of `complex` - you could use `np.dtype([('position', np.float, 2), ...])` instead to store 2-vectors. Obviously this is much more extensible to 3D than using `complex` – Eric Apr 23 '17 at 22:47
  • @Eric ah, yes, I pretty much "transliterated" without​ giving it much thought. – juanpa.arrivillaga Apr 23 '17 at 23:48
  • Thanks, @juanpa.arrivillaga. The reason for me attempting to use a class was so I got access to all the other underlying OO features. But structs would appear to be what I want. Many thanks. – TimGJ Apr 24 '17 at 05:36
  • @TimGJ honestly, I think what you really want is to use regular old Python lists. – juanpa.arrivillaga Apr 24 '17 at 05:53
  • @juanpa.arrivillaga I agree. The purpose of the exercise was to extend my (limited) knowledge of numpy to try and use more OO features with it. – TimGJ Apr 25 '17 at 19:14
  • 1
    @TimGJ `numpy` data structures themselves can act as powerful backbones for custom classes. Unfortunately, they do not play well as *containers* of arbitrary Python objects, and are specifically designed for using **num**bers without all the overhead of the python numeric types. They are essentially wrappers around C arrays, with machine-code compiled routines to give you some serious horsepower for numeric calculations thrown in to boot. – juanpa.arrivillaga Apr 25 '17 at 19:57
1

The straight forward list comprehension approach to creating points:

In [285]: [Body(p,m,v) for p,m,v in zip(positions, masses,velocities)]
Out[285]: [m = 2 p = 0j v = 0j, m = 5 p = (1+1j) v = 1j, m = 1 p = (2+0j) v = (1+0j)]
In [286]: timeit [Body(p,m,v) for p,m,v in zip(positions, masses,velocities)]
100000 loops, best of 3: 6.74 µs per loop

For this purpose, creating an array of objects, the frompyfunc is faster than np.vectorize (though you should use otypes with vectorize).

In [287]: vBody  = np.frompyfunc(Body,3,1)
In [288]: vBody(positions, masses, velocities)
Out[288]: 
array([m = 2 p = 0j v = 0j, m = 5 p = (1+1j) v = 1j,
       m = 1 p = (2+0j) v = (1+0j)], dtype=object)

vectorize is slower than the comprehension, but this frompyfunc version is competitive

In [289]: timeit vBody(positions, masses, velocities)
The slowest run took 12.26 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.56 µs per loop

vectorize/frompyfunc adds some useful functionality with broadcasting. For example by using ix_, I can generate a cartesian product of your 3 inputs, and 3d set of points, not just 3:

In [290]: points = vBody(*np.ix_(positions, masses, velocities))
In [291]: points.shape
Out[291]: (3, 3, 3)
In [292]: points
Out[292]: 
array([[[m = 2 p = 0j v = 0j, m = 2 p = 0j v = 1j, m = 2 p = 0j v = (1+0j)],
 ....
        [m = 1 p = (2+0j) v = 0j, m = 1 p = (2+0j) v = 1j,
         m = 1 p = (2+0j) v = (1+0j)]]], dtype=object)
In [293]: 

In short, a 1d object array has few advantages compared to a list; it's only when you need to organize the objects in 2 or more dimensions that these arrays have advantages.

As for accessing attributes, you have either use list comprehension, or the equivalent vectorize operations.

[x.position for x in points.ravel()]
Out[294]: 
[0j,
 0j,
 0j,
 ...
 (2+0j),
 (2+0j)]
In [295]: vpos = np.frompyfunc(lambda x:x.position,1,1)
In [296]: vpos(points)
Out[296]: 
array([[[0j, 0j, 0j],
        [0j, 0j, 0j],
     ...
        [(2+0j), (2+0j), (2+0j)],
        [(2+0j), (2+0j), (2+0j)]]], dtype=object)

In Tracking Python 2.7.x object attributes at class level to quickly construct numpy array

explores some alternative ways of storing/accessing object attributes.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353