-1

I have data that I want to fit with polynomials. I have 200,000 data points, so I want an efficient algorithm. I want to use the numpy.polynomial package so that I can try different families and degrees of polynomials. Is there some way I can formulate this as a system of equations like Ax=b? Is there a better way to solve this than with scipy.minimize?

import numpy as np
from scipy.optimize import minimize as mini

x1 = np.random.random(2000)
x2 = np.random.random(2000)
y = 20 * np.sin(x1) + x2 - np.sin (30 * x1 - x2 / 10)
def fitness(x, degree=5):

    poly1 = np.polynomial.polynomial.polyval(x1, x[:degree])
    poly2 = np.polynomial.polynomial.polyval(x2, x[degree:])
    return np.sum((y - (poly1 + poly2)) ** 2 )

# It seems like I should be able to solve this as a system of equations
# x = np.linalg.solve(np.concatenate([x1, x2]), y)

# minimize the sum of the squared residuals to find the optimal polynomial coefficients
x = mini(fitness, np.ones(10))
print fitness(x.x)
kilojoules
  • 9,768
  • 18
  • 77
  • 149
  • Your problem is an example of a linear least squares problem. ("Linear" meaning linear in the free parameters.) The singular value decomposition is a standard method for solving such problems. A web search for these terms should turn up a lot of hits. Good luck and have fun. – Robert Dodier Jan 11 '18 at 21:15

1 Answers1

1

Your intuition is right. You can solve this as a system of equations of the form Ax = b.

However:

  • The system is overdefined and you want to get the least-squares solution, so you need to use np.linalg.lstsq instead of np.linalg.solve.

  • You can't use polyval because you need to separate the coefficients and powers of the independent variable.

This is how to construct the system of equations and solve it:

A = np.stack([x1**0, x1**1, x1**2, x1**3, x1**4, x2**0, x2**1, x2**2, x2**3, x2**4]).T             
xx = np.linalg.lstsq(A, y)[0]
print(fitness(xx))  # test the result with original fitness function

Of course you can generalize over the degree:

A = np.stack([x1**p for p in range(degree)] + [x2**p for p in range(degree)]).T

With the example data, the least squares solution runs much faster than the minimize solution (800µs vs 35ms on my laptop). However, A can become quite large, so if memory is an issue minimize might still be an option.

Update:

Without any knowledge about the internals of the polynomial function things become tricky, but it is possible to separate terms and coefficients. Here is a somewhat ugly way to construct the system matrix A from a function like polyval:

def construct_A(valfunc, degree):
    columns1 = []
    columns2 = []
    for p in range(degree):
        c = np.zeros(degree)
        c[p] = 1
        columns1.append(valfunc(x1, c))
        columns2.append(valfunc(x2, c))
    return np.stack(columns1 + columns2).T

A = construct_A(np.polynomial.polynomial.polyval, 5)           
xx = np.linalg.lstsq(A, y)[0]
print(fitness(xx))  # test the result with original fitness function
MB-F
  • 22,770
  • 4
  • 61
  • 116
  • Thanks. I'm glad my intuition is right. I'm pretty sure this doesn't answer my question. I used `np.polynomial.polynomial` because it is a familiar example. But, I actually want to use more complicated families of polynomials. Fore example, what if I want to use the hermite_e polynomials? – kilojoules Jan 11 '18 at 15:46
  • I'm pretty sure It *does* answer the question. The answer was *not possible*, which may not be what you hoped for - but it's still a valid answer. Anyway, you are lucky because it turns out it is possible after all :) See updated answer. (Simply pass `hermite_e.hermeval` to the function instead of `polyval` and you are good to go.) – MB-F Jan 11 '18 at 16:43