numpy calculate polynom efficiently

Question

I'm trying to evaluate polynomial (3'd degree) using numpy. I found that doing it by simpler python code will be much more efficient.

import numpy as np
import timeit

m = [3,7,1,2]

f = lambda m,x: m[0]*x**3 + m[1]*x**2 + m[2]*x + m[3]
np_poly = np.poly1d(m)
np_polyval = lambda m,x: np.polyval(m,x)
np_pow = lambda m,x: np.power(x,[3,2,1,0]).dot(m)

print 'result={}, timeit={}'.format(f(m,12),timeit.Timer('f(m,12)', 'from __main__   import f,m').timeit(10000))
result=6206, timeit=0.0036780834198

print 'result={}, timeit={}'.format(np_poly(12),timeit.Timer('np_poly(12)', 'from __main__ import np_poly').timeit(10000))
result=6206, timeit=0.180546045303

print 'result={}, timeit={}'.format(np_polyval(m,12),timeit.Timer('np_polyval(m,12)', 'from __main__ import np_polyval,m').timeit(10000))
result=6206, timeit=0.227771043777

print 'result={}, timeit={}'.format(np_pow(m,12),timeit.Timer('np_pow(m,12)', 'from __main__ import np_pow,m').timeit(10000))
result=6206, timeit=0.168987989426

Did I miss something?

Is there another way in numpy to evaluate a polynomial?

score 8 · Answer 1 · answered Jun 05 '14 at 18:16

Something like 23 years ago I checked out a copy of Press et al Numerical Recipes in C from the university's library. There was a lot of cool stuff in that book, but there's a passage that has stuck with me over the years, page 173 here:

We assume that you know enough never to evaluate a polynomial this way:
    p=c[0]+c[1]*x+c[2]*x*x+c[3]*x*x*x+c[4]*x*x*x*x;
or (even worse!),
    p=c[0]+c[1]*x+c[2]*pow(x,2.0)+c[3]*pow(x,3.0)+c[4]*pow(x,4.0);
Come the (computer) revolution, all persons found guilty of such criminal behavior will be summarily executed, and their programs won't be! It is a matter of taste, however, whether to write
    p = c[0]+x*(c[1]+x*(c[2]+x*(c[3]+x*c[4])));
or
    p = (((c[4]*x+c[3])*x+c[2])*x+c[1])*x+c[0];

So if you are really worried about performance, you want to try that, the differences will be huge for higher degree polynomials:

In [24]: fast_f = lambda m, x: m[3] + x*(m[1] + x*(m[2] + x*m[3]))

In [25]: %timeit f(m, 12)
1000000 loops, best of 3: 478 ns per loop

In [26]: %timeit fast_f(m, 12)
1000000 loops, best of 3: 374 ns per loop

If you want to stick with numpy, there is a newer polynomial class that runs 2x faster than poly1d on my system, but is still much slower than the previous loops:

In [27]: np_fast_poly = np.polynomial.polynomial.Polynomial(m[::-1])

In [28]: %timeit np_poly(12)
100000 loops, best of 3: 15.4 us per loop

In [29]: %timeit np_fast_poly(12)
100000 loops, best of 3: 8.01 us per loop

+1 This is called [Horner's method](https://en.wikipedia.org/wiki/Horner's_method) or [Horner form](https://reference.wolfram.com/language/ref/HornerForm.html), and has made *huge* improvements to some of my code. — Mike, Jun 14 '15 at 23:08
It should be emphasized that this isn't just about speed; Horner form also improves the accuracy of the result in a lot of cases. In particular, a polynomial written naively might involve a lot of cancellations, which leads to numerical errors if the sum of the terms is much smaller than the sum of the *absolute values of* the terms. In Horner form, such cancellations don't happen. — Mike, Jun 21 '15 at 16:05
@Mike And there are better methods, too. "For a large class of polynomials, the standard method of polynomial evaluation, Horner's method, can be very inaccurate. The alternative method given here is on average 100 to 1000 times more accurate than Horner's Method." - Accurate Evaluation of Polynomials, Sutin 2007 — endolith, May 16 '17 at 16:06

shx2 · Accepted Answer · 2014-06-05T17:51:06.850

Well, looking at the implementation of polyval (which is the function eventually being called when you eval a poly1d), it seems weird the implementor decided to include an explicit loop... From the source of numpy 1.6.2:

def polyval(p, x):
    p = NX.asarray(p)
    if isinstance(x, poly1d):
        y = 0
    else:
        x = NX.asarray(x)
        y = NX.zeros_like(x)
    for i in range(len(p)):
        y = x * y + p[i]
    return y

On one hand, avoiding the power operation should be advantageous speed-wise, on the other hand, the python-level loop pretty much screws things up.

Here's an alternative numpy-ish implemenation:

POW = np.arange(100)[::-1]
def g(m, x):
    return np.dot(m, x ** POW[m.size : ])

For speed, I avoid recreating the power array on each call. Also, to be fair when benchmarking against numpy, you should start with numpy arrays, not lists, to avoid the penalty of converting the list to numpy on each call.

So, when adding m = np.array(m), my g above only runs about 50% slower than your f.

Despite being slower on the example you posted, for evaluating a low-degree polynomial on a scalar x, you really can't do much faster than an explict implemenation (like your f) (of course you can, but probably not by much without resorting to writing lower-level code). However, for higher degrees (where you have to replace you explict expression with some sort of a loop), the numpy approach (e.g. g) would prove much faster as the degree increases, and also for vectorized evaluation, i.e. when x is a vector.

"it seems weird the implementor decided to include an explicit loop" probably results in memory errors if you try to do it all in one block? — endolith, Mar 31 '15 at 23:02
-1 The numpy source is evaluating in Horner form, which may not be easily vectorized, but is far superior to what you wrote. (See the other answer to this question.) — Mike, Jun 14 '15 at 23:09

numpy calculate polynom efficiently

2 Answers2

Linked