Perform operations on elements of a NumPy array

Question

Is there a faster/smarter way to perform operations on every element of a numpy array? What I specifically have is a list of datetime objects like, e.g.:

hh = np.array( [ dt.date(2000, 1, 1), dt.date(2001, 1, 1) ] )

To get a list of of years from that I do at the moment:

years = np.array( [ x.year for x in hh ] )

Is there a smarter way to do this? I'm thinking something like

hh.year

which obviously doesn't work.

I have a script in which I need different variations of a (much longer) array constantly (year, month, hours...). Of course I could always just define a separate array for everything but like there should be a more elegant solution.

Maybe use pandas's datetime64? Check the answer to this: http://stackoverflow.com/questions/13648774/get-year-month-or-day-from-numpy-datetime64 — ojy, Aug 25 '14 at 22:34

Luis Masuelli · Accepted Answer · 2014-08-25T23:04:28.150

If you evaluate a python expression for each element, it doesn't matter whether the iteration will be done in C++ or Python. What will have weight is the python-complexity of the evaluated (in-loop) expression. This means: If your (in-loop) expression takes 1 microsec (a very simple script), it will be significantly harder than the difference between using a python iteration or a C++ iteration (you have a "marshalling" between C++ and PyObjects, and that applies to python functions as well).

For that reason, calling vectorize is -under the hoods- done in Python: what is called inside is python code. The idea behind vectorize is not performance, but code readability and ease of iteration: vectorize performs introspection (of function's parameters) and serves well for N-dimensional iterations (i.e. a lambda x,y: x+y automagically serves to iterate in two dimensions).

So: no, there's no "fast" way to iterate python code. The final speed that matters is the speed of your inner python code.

Edit: your -desired- hh.year looks like hh*.year equivalent in groovy, but even there under the hoods is the same as an in-code iteration. Comprehensions are the fastest (and equivalent) way in python. The real pity is being forced to:

years = np.array( [ x.year for x in hh ] )

(which forces you to create another provably-huge-sized) instead of letting you use any type of iterator:

years = np.array( x.year for x in hh )

Edit (suggestion by @Jaime): You can't construct array with that function from an iterator. For that, you must use:

np.fromiter(x.year for x in hh, dtype=int, count=len(x))

which lets you save the time and memory of building an intermediate array. This exact approach works for any sequence to avoid the inner-list creation (this one would be your case) but does not work with other types of generators, for future cases you'd need.

There is [`np.fromiter`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html), so `np.fromiter(x.year for x in hh, dtype=int, count=len(x))` is probably going to be as fast as it gets. — Jaime, Aug 25 '14 at 23:01
`ufunc` is another mechanism. http://docs.scipy.org/doc/numpy-dev/user/c-info.ufunc-tutorial.html It doesn't speed up the iteration, but gives access to features like ndimensions and broadcasting. — hpaulj, Aug 25 '14 at 23:55

colcarroll · Answer 2 · 2014-08-25T21:56:17.347

0

You can use numpy.vectorize.

Doing some benchmarking, performance is pretty similar (vectorize slightly slower than a list comprehension), and in my opinion numpy.vectorize(lambda j: j.year)(hh) (or something similar) doesn't look super elegant.

edited Aug 25 '14 at 21:56

answered Aug 25 '14 at 21:45

colcarroll

3,632
17
25

Perform operations on elements of a NumPy array

2 Answers2