6

I have this code:

output_array = np.vectorize(f, otypes='d')(input_array)

And I'd like to replace it with this code, which is supposed to give the same output:

output_array = np.ndarray(input_array.shape, dtype='d')
for i, item in enumerate(input_array):
    output_array[i] = f(item)

The reason I want the second version is that I can then start iterating on output_array in a separate thread, while it's being calculated. (Yes, I know about the GIL, that part is taken care of.)

Unfortunately, the for loop is very slow, even when I'm not processing the data on separate thread. I benchmarked it on both CPython and PyPy3, which is my target platform. On CPython it's 3 times slower than vectorize, and on PyPy3 it's 67 times slower than vectorize!

That's despite the fact that the Numpy documentation says "The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop."

Any idea why my implementation is slow, and how to make a fast implementation that still allows me to use output_array before it's finished?

Ram Rachum
  • 84,019
  • 84
  • 236
  • 374
  • For a start, don't use `np.ndarray`. Initialize an array with `np.zeros` or `np.empty`. But that said, I am surprised the `np.vectorize` is faster. Tell us something about the function. And `input_array` - what's the `dtype` and `shape` (roughly). – hpaulj Jul 10 '20 at 06:31
  • I'm not sure how relevant it is, but it's basically does a bunch of `math.sin` actions, multiplications and exponents. It's the `get_pressure` method here: https://github.com/cool-RR/python_synthesizer/blob/master/synthesizer.py#L51 – Ram Rachum Jul 10 '20 at 06:35
  • Could you replace the `math.sin` (etc) with `np.sin` and skip both `vectorize` and the loop? – hpaulj Jul 10 '20 at 06:37
  • I considered that, but I'll have to change the entire way my program is built, and I'd like to avoid that if possible. – Ram Rachum Jul 10 '20 at 06:44

2 Answers2

1

Sebastian Berg gave me a solution. When iterating over items from the input array, use item.item() rather than just item. This turns the numpy.float64 objects to normal Python floats, making everything much faster and solving my particular problem :)

Ram Rachum
  • 84,019
  • 84
  • 236
  • 374
0

Vectorization in numpy will provide a performance boost in numerical computation over a for loop, because it cuts down on some of the Python interpreter overhead of dynamically determining some of the characteristics of the data (like its type and location) at runtime.

An ndarray object does a lot of powerful things that optimize computational speed, like making sure the data it contains is homogenous and contiguous in memory.

For n-dimensional data, by default, numpy stores values in C ordering (row major) such that the consecutive elements of a row are stored next to each other.

It is also a strided view on a block of data, meaning that when you initialize it, you're also saying exactly how many bytes of memory each item inside takes up -- in other words, how big a step you need to take to get to the next one.

When you are using a vectorized function on an ndarray object, it's behaving more like C code than like Python. That is to say, it's operating directly on the values in memory and modifying them, and taking advantage of all the storage and type optimizations.

I suspect that when you are not vectorizing whatever your function f is, you're adding a lot to the Python interpreter overhead. When it's vectorized successfully, most of that function is going to be executed in C and avoid most of the slowdown that Python contributes. I had wondered whether enumerate was failing to take advantage of the underlying data structure the way that numpy's nditer object would, but I benchmarked nditer, numpy ufuncs, and explicit for-looping with enumerate, and enumerate was actually the fastest iterator, so I am guessing your custom function is probably the time culprit. That makes sense especially given that PyPy has a much more dramatic slowdown.

Example benchmarks:

a = np.ndarray()

>>> %%time
>>> for x in np.nditer(a, flags=['external_loop']):
>>> ....    x*x
CPU times: user 201 ms, sys: 219 ms, total: 420 ms
Wall time: 420 ms

>>> %%time
>>> np.square(a)
CPU times: user 201 ms, sys: 180 ms, total: 381 ms
Wall time: 380 ms

>>> %%time
>>> for i, x in enumerate(a):
>>> ....    x*x

CPU times: user 78.5 ms, sys: 1.79 ms, total: 80.3 ms
Wall time: 79.4 ms
Ray Johns
  • 768
  • 6
  • 14
  • "I am guessing your custom function is probably the time culprit" How can that be true when the fast version is also calling my custom function? – Ram Rachum Jul 10 '20 at 08:12
  • When you successfully vectorize a function, numpy delegates a lot of what is actually happening in the Python function to the C code, rather than relying on the Python interpreter to do all the heavy lifting, which is the point of vectorizing Python functions in the first place -- to speed them up. – Ray Johns Jul 10 '20 at 08:18
  • I confirmed with a debugger that `vectorize` is actually calling into my Python functions, giving control back to the Python interpreter, and running all the lines on it one-by-one, using honest-to-God floats and ints. – Ram Rachum Jul 10 '20 at 08:25
  • Maybe it has to do with the way that you're accessing memory, then -- vectorizing is going to be really efficient because of striding and taking advantage of row-major format, but I don't think that a for loop with enumerate and bracket indexing gets the same efficiencies. – Ray Johns Jul 10 '20 at 08:50
  • I understand. Does NumPy provide anything similar to `vectorize` that lets me access the array before it's done calculating all the results? – Ram Rachum Jul 11 '20 at 09:58