4

Polyfit is a great tool to fit a line to a set of points. However my data has varying levels of statistical significance.

For example, for one point (x1,y2) I might only have 10 observations, while for another point (x2,y2) I might have 10,000 observations. I usually have at least 10 points and I'd like to weight each according to statistical significance when using polyfit. Is there any way (or a similar function) that allows for that?

Sam Odio
  • 2,717
  • 5
  • 22
  • 25

3 Answers3

3

One possibility is to use weighted least squares in statsmodels

roughly:

y is response or endogenous variable (endog)

x is your 1 dimensional explanatory variable

w your weight array, the higher the more weight on that observation

to get the polynomial matrix, and fit

import numpy as np
import statsmodels.api as sm
exog = np.vander(x, degree+1)
result = sm.WLS(y, exog, weight=w).fit()

the parameters are in result.params. The fitted values are in result.fittedvalues

Prediction has changed between versions. With version 0.4 you can use

result.predict(np.vander(x_new, degree+1))
Josef
  • 21,998
  • 3
  • 54
  • 67
2

more straightforward:

import numpy as np
result = np.polynomial.polynomial.polyfit(x,y,deg,w=weight of each observation)
1

I do not know about numpy but you can write your own polyfit function. Polyfit is just solving of linear equation.

http://en.wikipedia.org/wiki/Polynomial_regression#Matrix_form_and_calculation_of_estimates
(in your case epsilon is probably 0)

You can see that all you have to do is multipling each line in y and each line in x whit your coeficient.
This shoul be like 10 lines of code (i remeber that it took me like 4h to reinvent minsquare equation on my own, but only 2 lines of code in MATLAB)

Luka Rahne
  • 10,336
  • 3
  • 34
  • 56