2

I am trying to fit a second order function with a zero intercept. Right now when I plot it, I am getting a line with a y-int > 0. I am trying to fit to a set of points output by the function:

y**2 = 14.29566 * np.pi * x

or

y = np.sqrt(14.29566 * np.pi * x)

to two data sets x and y, with D = 3.57391553. My fitting routine is:

z = np.polyfit(x,y,2)                         # Generate curve coefficients
p = np.poly1d(z)                              # Generate curve function

xp = np.linspace(0, catalog.tlimit, 10)       # generate input values

plt.scatter(x,y)
plt.plot(xp, p(xp), '-')
plt.show()

I have also tried using statsmodels.ols:

mod_ols = sm.OLS(y,x)
res_ols = mod_ols.fit()

but I don't understand how to generate coefficients for a second order function as opposed to a linear function, nor how to set the y-int to 0. I saw another similar post dealing with forcing a y-int of 0 with a linear fit, but I couldn't figure out how to do that with a second order function.

Plot right now: I want to anchor the curve at (0,0)

Data:

x = [0., 0.00325492, 0.00650985, 0.00976477, 0.01301969, 0.01627462, 0.01952954, 0.02278447, 
     0.02603939, 0.02929431, 0.03254924, 0.03580416, 0.03905908, 0.04231401]
y = [0., 0.38233801, 0.5407076, 0.66222886, 0.76467602, 0.85493378, 0.93653303, 1.01157129,
     1.0814152, 1.14701403, 1.20905895, 1.26807172, 1.32445772, 1.3785393]
Nikk O'ghaza
  • 21
  • 1
  • 5
  • use np.vander to create the polynomial design matrix and remove the constant column, or x[:,None] ** np.arange(1, k+1), and then use OLS. – Josef Jan 20 '17 at 19:59

2 Answers2

2

If I understood you correctly you want to fit polynomial regression lines with OLS, you can try the following (as we can see, as the degree of the fitted polynomial increases, the model overfits the data more and more):

xp = np.linspace(0, 0.05, 10)       # generate input values
pred1 = p(xp)  # prediction with np.poly as you have done

import pandas as pd
data = pd.DataFrame(data={'x':x, 'y':y})

# let's first fit 2nd order polynomial with OLS with intercept
olsres2 = sm.ols(formula = 'y ~ x + I(x**2)', data = data).fit()
print olsres2.summary()


                             OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.978
Model:                            OLS   Adj. R-squared:                  0.974
Method:                 Least Squares   F-statistic:                     243.1
Date:                Sat, 21 Jan 2017   Prob (F-statistic):           7.89e-10
Time:                        04:16:22   Log-Likelihood:                 20.323
No. Observations:                  14   AIC:                            -34.65
Df Residuals:                      11   BIC:                            -32.73
Df Model:                           2
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      0.1470      0.045      3.287      0.007         0.049     0.245
x             52.4655      4.907     10.691      0.000        41.664    63.267
I(x ** 2)   -580.4730    111.820     -5.191      0.000      -826.588  -334.358
==============================================================================
Omnibus:                        4.803   Durbin-Watson:                   1.164
Prob(Omnibus):                  0.091   Jarque-Bera (JB):                2.101
Skew:                          -0.854   Prob(JB):                        0.350
Kurtosis:                       3.826   Cond. No.                     6.55e+03
==============================================================================

pred2 = olsres2.predict(data) # predict with the fitted model
# fit 3rd order polynomial with OLS with intercept
olsres3 = sm.ols(formula = 'y ~ x + I(x**2) + I(x**3)', data = data).fit()
pred3 = olsres3.predict(data) # predict

plt.scatter(x,y)
plt.plot(xp, pred1, '-r', label='np.poly')
plt.plot(x, pred2, '-b', label='ols.o2')
plt.plot(x, pred3, '-g', label='ols.o3')
plt.legend(loc='upper left')
plt.show()

enter image description here

# now let's fit the polynomial regression lines this time without intercept
olsres2 = sm.ols(formula = 'y ~ x + I(x**2)-1', data = data).fit()
print olsres2.summary()
                                OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.993
Model:                            OLS   Adj. R-squared:                  0.992
Method:                 Least Squares   F-statistic:                     889.6
Date:                Sat, 21 Jan 2017   Prob (F-statistic):           9.04e-14
Time:                        04:16:24   Log-Likelihood:                 15.532
No. Observations:                  14   AIC:                            -27.06
Df Residuals:                      12   BIC:                            -25.79
Df Model:                           2
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x             65.8170      3.714     17.723      0.000        57.726    73.908
I(x ** 2)   -833.6787    109.279     -7.629      0.000     -1071.777  -595.580
==============================================================================
Omnibus:                        1.716   Durbin-Watson:                   0.537
Prob(Omnibus):                  0.424   Jarque-Bera (JB):                1.341
Skew:                           0.649   Prob(JB):                        0.511
Kurtosis:                       2.217   Cond. No.                         118.
==============================================================================
pred2 = olsres2.predict(data)
# fit 3rd order polynomial with OLS without intercept
olsres3 = sm.ols(formula = 'y ~ x + I(x**2) + I(x**3) -1', data = data).fit()
pred3 = olsres3.predict(data)

plt.scatter(x,y)
plt.plot(xp, pred1, '-r', label='np.poly')
plt.plot(x, pred2, '-b', label='ols.o2')
plt.plot(x, pred3, '-g', label='ols.o3')
plt.legend(loc='upper left')
plt.show()

enter image description here

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
1

This code provides quite a direct answer to your question. I must admit that I experimented with the patsy language first (see https://patsy.readthedocs.io/en/latest/).

The heart of this is the declaration of the model: np.power(y,2) ~ x - 1. It means regress the square of y on x, omitting the intercept. Once the coefficient of x has been estimated division by pi gives the number you want.

x = [0., 0.00325492, 0.00650985, 0.00976477, 0.01301969, 0.01627462, 0.01952954, 0.02278447, 0.02603939, 0.02929431, 0.03254924, 0.03580416, 0.03905908, 0.04231401]
y = [0., 0.38233801, 0.5407076, 0.66222886, 0.76467602, 0.85493378, 0.93653303, 1.01157129,  1.0814152, 1.14701403, 1.20905895, 1.26807172, 1.32445772, 1.3785393]

data = { 'x': x, 'y': y }

import statsmodels.formula.api as smf
import numpy as np

model = smf.ols(formula = 'np.power(y,2) ~ x - 1', data = data)
results = model.fit()
print (results.params)

predicteds = results.predict(data)

import matplotlib.pyplot as plt

plt.scatter(x,y)
plt.plot(x, [_**2 for _ in y], 'go')
plt.plot(x, predicteds, '-b')
plt.show()

The regression coefficient is 44.911147. Here's the graphical result.

plot

OK, I'm suspicious. ;)

Bill Bell
  • 21,021
  • 5
  • 43
  • 58