Say we have a list of instances of a class, which all have an attribute that we know is a float -- call the attribute x. At various points in a program, we want to extract a numpy array of all values of x for running some analysis on the distribution of x. This extraction process is done a lot, and it's been identified as a slow part of the program. Here is an extremely simple example to illustrate specifically what I have in mind:
import numpy as np
# Create example object with list of values
class stub_object(object):
def __init__(self, x):
self.x = x
# Define a list of these fake objects
stubs = [stub_object(i) for i in range(10)]
# ...much later, want to quickly extract a vector of this particular attribute:
numpy_x_array = np.array([a_stub.x for a_stub in stubs])
Here's the question: is there a clever, faster way to track the "x" attribute across instances of stub_object in the "stubs" list, such that constructing the "numpy_x_array" is faster than the process above?
Here's a rough idea I am trying to hammer out: can I create a "global to the class type" numpy vector, which will update as the set of objects updates, but I can operate on efficiently any time I want?
All I am really looking for is a "nudge in the right direction." Providing keywords I can google / search SO / docs further is exactly what I am looking for.
For what it is worth, I've looked into these, which have gotten me a little further but not completely there:
- Getting attributes from arrays of objects in NumPy
- I think the recarray solution won't work, as my objects are more complex than the "struct-like" objects described in the accepted answer.
- numpy array of objects
- vectorizing the the init function is interesting, which I will try (but suspect it may get complicated given true, non-stub_object init structure)
- Python attributes and numpy arrays
- This Q reminds me that numpy arrays are mutable, which may be the answer. Is this a feature or a bug to be corrected in future?
Others I looked at, which were not as helpful:
(One option, of course, is to "simply" overhaul the structure of the code, such that instead of a "stubs" list of "stub_objects," there is one large object, something like stub_population, which maintains the relevant attributes in lists and/or numpy arrays, and methods that simply act on elements of those arrays. The downside to that is lots of refactoring, and some reduction of the abstraction and flexibility of modeling the "stub_object" as its own thing. I'd like to avoid this if there is a clever way to do so.)
Edit: I am using 2.7.x
Edit 2: @hpaulj, your example has been a big help -- answer accepted.
Here's the extremely simple first-pass version of the example code above that is doing what I want. There are very prelim indications of possible one order-magnitude speedup, without significant rearrangement of code body. Excellent. Thanks!
size = 20
# Create example object with list of values
class stub_object(object):
_x = np.zeros(size, dtype=np.float64)
def __init__(self, x, i):
# A quick cop-out for expanding the array:
if i >= len(self._x):
raise Exception, "Index i = " +str(i)+ " is larger than allowable object size of len(self._x) = "+ str(self._x)
self.x = self._x[i:i+1]
self.set_x(x)
def get_x(self):
return self.x[0]
def set_x(self, x_new):
self.x[0] = x_new
# Examine:
# Define a list of these fake objects
stubs = [stub_object(x=i**2, i) for i in range(size)]
# ...much later, want to quickly extract a vector of this particular attribute:
#numpy_x_array = np.array([a_stub.x for a_stub in stubs])
# Now can do:
numpy_x_array = stub_object._x # or
numpy_x_array = stubs[0]._x # if need to use the list to access
Not using properties yet, but really like that idea a lot, and it should go a long way in making code very close to unchanged.