If you evaluate a python expression for each element, it doesn't matter whether the iteration will be done in C++ or Python. What will have weight is the python-complexity of the evaluated (in-loop) expression. This means: If your (in-loop) expression takes 1 microsec (a very simple script), it will be significantly harder than the difference between using a python iteration or a C++ iteration (you have a "marshalling" between C++ and PyObjects, and that applies to python functions as well).
For that reason, calling vectorize
is -under the hoods- done in Python: what is called inside is python code. The idea behind vectorize
is not performance, but code readability and ease of iteration: vectorize
performs introspection (of function's parameters) and serves well for N-dimensional iterations (i.e. a lambda x,y: x+y
automagically serves to iterate in two dimensions).
So: no, there's no "fast" way to iterate python code. The final speed that matters is the speed of your inner python code.
Edit: your -desired- hh.year
looks like hh*.year
equivalent in groovy, but even there under the hoods is the same as an in-code iteration. Comprehensions are the fastest (and equivalent) way in python. The real pity is being forced to:
years = np.array( [ x.year for x in hh ] )
(which forces you to create another provably-huge-sized) instead of letting you use any type of iterator:
years = np.array( x.year for x in hh )
Edit (suggestion by @Jaime): You can't construct array
with that function from an iterator. For that, you must use:
np.fromiter(x.year for x in hh, dtype=int, count=len(x))
which lets you save the time and memory of building an intermediate array. This exact approach works for any sequence to avoid the inner-list creation (this one would be your case) but does not work with other types of generators, for future cases you'd need.