7

My goal is to fit some data to a polynomial function and obtain the actual equation including the fitted parameter values.

I adapted this example to my data and the outcome is as expected.

Here is my code:

import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline


x = np.array([0., 4., 9., 12., 16., 20., 24., 27.])
y = np.array([2.9,4.3,66.7,91.4,109.2,114.8,135.5,134.2])

x_plot = np.linspace(0, max(x), 100)
# create matrix versions of these arrays
X = x[:, np.newaxis]
X_plot = x_plot[:, np.newaxis]

plt.scatter(x, y, label="training points")

for degree in np.arange(3, 6, 1):
    model = make_pipeline(PolynomialFeatures(degree), Ridge())
    model.fit(X, y)
    y_plot = model.predict(X_plot)
    plt.plot(x_plot, y_plot, label="degree %d" % degree)

plt.legend(loc='lower left')

plt.show()

enter image description here

However, I now don't know where to extract the actual equation and fitted parameter values for the respective fits. Where do I access the actual fitted equation?

EDIT:

The variable model has the following attributes:

model.decision_function  model.fit_transform      model.inverse_transform  model.predict            model.predict_proba      model.set_params         model.transform          
model.fit                model.get_params         model.named_steps        model.predict_log_proba  model.score              model.steps

model.get_params does not store the desired parameters.

Cleb
  • 25,102
  • 20
  • 116
  • 151

1 Answers1

15

The coefficients of the linear model are stored in the intercept_ and coeff_ attributes of the model.

You can see this more clearly by turning-down the regularization and feeding-in a known model; e.g.

import numpy as np
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures

x = 10 * np.random.random(100)
y = -4 + 2 * x - 3 * x ** 2

model = make_pipeline(PolynomialFeatures(2), Ridge(alpha=1E-8, fit_intercept=False))
model.fit(x[:, None], y)
ridge = model.named_steps['ridge']
print(ridge.coef_)
# array([-4.,  2., -3.])

Also note that the PolynomialFeatures by default includes a bias term, so fitting the intercept in Ridge will be redundant for small alpha.

jakevdp
  • 77,104
  • 11
  • 125
  • 160
  • Great, that works. Kind of hidden, in my opinion. I upvote it and accept it later on. – Cleb Nov 23 '15 at 17:34
  • It's "hidden" because scikit-learn is a machine learning library, not a statistical modeling library. In general, Machine Learning focuses on the outputs of models rather than the parameters of models. See [Statistical Modeling: The Two Cultures](https://projecteuclid.org/euclid.ss/1009213726) for a classic discussion of this divide. – jakevdp Nov 23 '15 at 17:36
  • Thanks for the link! Would you use scikit-learn for this kind of parameter estimation then or would something else be more appropriate? I am asking since I would like e.g. to avoid negative values and I am not sure how easy that would be using this module. – Cleb Nov 23 '15 at 17:41
  • 1
    The [statsmodels](http://statsmodels.sourceforge.net/) library might be a better choice for constrained statistical modeling like you have in mind. – jakevdp Nov 23 '15 at 17:44