Vectorize repetitive math function in Python

Question

I have a mathematical function of this form $f(x)=\sum_{j=0}^N x^j * \sin(j*x)$ that I would like to compute efficiently in Python. N is of order ~100. This function f is evaluated thousands of times for all entries x of a huge matrix, and therefore I would like to improve the performance (profiler indicates that calculation of f takes up most of the time). In order to avoid the loop in the definition of the function f I wrote:

def f(x)
    J=np.arange(0,N+1)
    return sum(x**J*np.sin(j*x))

The issue is that if I want to evaluate this function for all entries of a matrix, I would need to use numpy.vectorize first, but as far as I know this not necessarily faster than a for loop.

Is there an efficient way to perform a calculation of this type?

Peque · Answer 1 · 2016-02-12T21:02:13.033

Welcome to Sack Overflow! ^^

Well, calculating something ** 100 is some serious thing. But notice how, when you declare your array J, you are forcing your function to calculate x, x^2, x^3, x^4, ... (and so on) independently.

Let us take for example this function (which is what you are using):

def powervector(x, n):
    return x ** np.arange(0, n)

And now this other function, which does not even use NumPy:

def power(x, n):
    result = [1., x]
    aux = x
    for i in range(2, n):
        aux *= x               
        result.append(aux)
    return result

Now, let us verify that they both calculate the same thing:

In []: sum(powervector(1.1, 10))
Out[]: 15.937424601000005

In []: sum(power(1.1, 10))
Out[]: 15.937424601000009

Cool, now let us compare the performance of both (in iPython):

In [36]: %timeit sum(powervector(1.1, 10))
The slowest run took 20.42 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 3.52 µs per loop

In [37]: %timeit sum(power(1.1, 10))
The slowest run took 5.28 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.13 µs per loop

It is faster, as you are not calculating all the powers of x, because you know that x ^ N == (x ^ N - 1) * x and you take advantage of it.

You could use this to see if your performance improves. Of course you can change power() to use NumPy vectors as output. You can also have a look at Numba, which is easy to try and may improve performance a bit as well.

As you see, this is only a hint on how to improve some part of your problem. I bet there are a couple of other ways to further improve your code! :-)

Edit

It seems that Numba might not be a bad idea... Simply adding @numba.jit decorator:

@numba.jit
def powernumba(x, n):
    result = [1., x]
    aux = x
    for i in range(2, n):
        aux *= x               
        result.append(aux)
    return result

Then:

In [52]: %timeit sum(power(1.1, 100))
100000 loops, best of 3: 7.67 µs per loop

In [51]: %timeit sum(powernumba(1.1, 100))
The slowest run took 5.64 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 2.64 µs per loop

It seems Numba can do some magic there. ;-)

This is a great idea, and I will certainly implement something along these lines. However, in the future I'm expecting to replace the expression with something like sum_{j=0}^N g(x,C), for some function g, and therefore I'm trying to look for a more generic way to improve the code regardless of the function g. — PythonBeginner, Feb 12 '16 at 21:04
@PythonBeginner: Numba will still apply in that case. Have a look at that project, you will not regret it! ;-) — Peque, Feb 12 '16 at 21:11

ev-br · Answer 2 · 2016-02-12T22:48:23.513

For a scalar x:

>>> import numpy as np
>>> x = 0.5
>>> jj = np.arange(10)
>>> x**jj
array([ 1.        ,  0.5       ,  0.25      ,  0.125     ,  0.0625    ,
        0.03125   ,  0.015625  ,  0.0078125 ,  0.00390625,  0.00195312])
>>> np.sin(jj*x)
array([ 0.        ,  0.47942554,  0.84147098,  0.99749499,  0.90929743,
        0.59847214,  0.14112001, -0.35078323, -0.7568025 , -0.97753012])
>>> (x**jj * np.sin(jj*x)).sum()
0.64489974041068521

Notice the use of the sum method of numpy arrays (equivalently, use np.sum not built-in sum).

If your x is itself an array, use broadcasting:

>>> a = x[:, None]**jj
>>> a.shape
(3, 10)
>>> x[0]**jj == a[0]
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

Then sum over a second axis:

>>> res = a * np.sin(jj * x[:, None])
>>> res.shape
(3, 10)
>>> res.sum(axis=1)
array([ 0.01230993,  0.0613201 ,  0.17154859])

Vectorize repetitive math function in Python

2 Answers2

Edit