numpy polyfit with data that has varying levels of statistical significance

Question

Polyfit is a great tool to fit a line to a set of points. However my data has varying levels of statistical significance.

For example, for one point (x1,y2) I might only have 10 observations, while for another point (x2,y2) I might have 10,000 observations. I usually have at least 10 points and I'd like to weight each according to statistical significance when using polyfit. Is there any way (or a similar function) that allows for that?

Josef · Answer 1 · 2012-06-30T17:17:09.187

One possibility is to use weighted least squares in statsmodels

roughly:

y is response or endogenous variable (endog)

x is your 1 dimensional explanatory variable

w your weight array, the higher the more weight on that observation

to get the polynomial matrix, and fit

import numpy as np
import statsmodels.api as sm
exog = np.vander(x, degree+1)
result = sm.WLS(y, exog, weight=w).fit()

the parameters are in result.params. The fitted values are in result.fittedvalues

Prediction has changed between versions. With version 0.4 you can use

result.predict(np.vander(x_new, degree+1))

score 2 · Answer 2 · answered Dec 17 '12 at 08:17

2

more straightforward:

import numpy as np
result = np.polynomial.polynomial.polyfit(x,y,deg,w=weight of each observation)

answered Dec 17 '12 at 08:17

Andre Lehmann

41
2

Luka Rahne · Answer 3 · 2011-12-09T22:53:24.930

I do not know about numpy but you can write your own polyfit function. Polyfit is just solving of linear equation.

http://en.wikipedia.org/wiki/Polynomial_regression#Matrix_form_and_calculation_of_estimates
(in your case epsilon is probably 0)

You can see that all you have to do is multipling each line in y and each line in x whit your coeficient.
This shoul be like 10 lines of code (i remeber that it took me like 4h to reinvent minsquare equation on my own, but only 2 lines of code in MATLAB)

numpy polyfit with data that has varying levels of statistical significance

3 Answers3