GridsearchCV for Polynomial Regression

Question

I was new to Machine Learning and stuck with this.

When I was trying to implement polynomial regression in Linear model, like using several degree of polynomials range(1,10) and get different MSE. I actually use GridsearchCV method to find the best parameters for polynomial.

from sklearn.model_selection import GridSearchCV

poly_grid = GridSearchCV(PolynomialRegression(), param_grid, cv=10, scoring='neg_mean_squared_error')

I don't know how to get the the above PolynomialRegression() estimator. One solution I searched was:

import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline

def PolynomialRegression(degree=2, **kwargs):
    return make_pipeline(PolynomialFeatures(degree), LinearRegression(**kwargs))

param_grid = {'polynomialfeatures__degree': np.arange(10), 'linearregression__fit_intercept': [True, False], 'linearregression__normalize': [True, False]}

poly_grid = GridSearchCV(PolynomialRegression(), param_grid, cv=10, scoring='neg_mean_squared_error')

But it didn't even generate any result.

Vivek Kumar · Accepted Answer · 2017-11-24T07:12:39.527

poly_grid = GridSearchCV...

will only declare and instantiate the grid search object. You need to supply some data with fit() method to do any training or hyper-parameter search.

Something like this:

poly_grid.fit(X, y)

Where X and y are your training data and labels.

Please see the documentation:

fit(X, y=None, groups=None, **fit_params)[source]
Run fit with all sets of parameters.

And then use the cv_results_ and/or best_params_ to analyse the results.

Please take a look at the examples given below:

Responding to comment:

@BillyChow Do you call poly_grid.fit() or not? If no, then obviously it wont produce any result.

If yes, then depending on your data, it will take a lot of time because you have specified degree from 1 to 10 in params with 10-fold cv. So as the degree increases, the time to fit and cross-validate increases pretty quickly.

Still if you want to see the working, you can add verbose param to the gridSearchCV, like this:

poly_grid = GridSearchCV(PolynomialRegression(), param_grid, 
                         cv=10, 
                         scoring='neg_mean_squared_error', 
                         verbose=3)

And then call poly_grid.fit(X, y)

Yes I've already done these steps before. But what I'm asking is how to build the polynomial classifier in gridsearchCV. or how to find best degree of polynomials through cross validation set. — Billy Chow, Nov 22 '17 at 13:52
@BillyChow Please clarify what you mean by "didn't even generate any result" — Vivek Kumar, Nov 22 '17 at 13:59
param_grid = {'polynomialfeatures__degree': [2,3,4,5], 'linearregression__fit_intercept': [True, False], 'linearregression__normalize': [True, False]} ....then.... poly_grid = GridSearchCV(PolynomialRegression(), param_grid, cv=10, scoring='neg_mean_squared_error') — Billy Chow, Nov 22 '17 at 15:08
@BillyChow Why did you upvote and approve the answer? It has nothing to do with your question. — AturSams, Feb 24 '19 at 11:00

score 1 · Answer 2 · answered Jun 18 '20 at 22:38

Importing pandas as numpy:

import numpy as np
import pandas as pd

Creating a sample dataset:

df = pd.DataFrame(data={'X': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 
                        'Y': [1, 4, 9, 16, 25, 36, 49, 64, 81, 100], 
                        'Label': [1, 3, 10, 17, 23, 45, 50, 55, 90, 114]})
X_train = df[['X', 'Y']]
y_train = df['Label']

In polynomial regression you're changing the degree of your dataset features, that is, you're not actually changing a hyperparameter. Therefore, I think that simulating a GridSearchCV using for loops is a better idea than using GridSearchCV. In the following code, the list degrees are the degrees that will be tested.

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import cross_val_score 
degrees = [2, 3, 4, 5, 6] # Change degree "hyperparameter" here
normalizes = [True, False] # Change normalize hyperparameter here
best_score = 0
best_degree = 0
for degree in degrees:
    for normalize in normalizes:
        poly_features = PolynomialFeatures(degree = degree)
        X_train_poly = poly_features.fit_transform(X_train)
        polynomial_regressor = LinearRegression(normalize=normalize)
        polynomial_regressor.fit(X_train_poly, y_train)
        scores = cross_val_score(polynomial_regressor, X_train_poly, y_train, cv=5) # Change k-fold cv value here
        if max(scores) > best_score:
            best_score = max(scores)
            best_degree = degree
            best_normalize = normalize

Print the best score:

print(best_score)

0.9031682820376132

Print the best hyperparameters:

print(best_normalize)
print(best_degree)

False
2

Create the best polynomial regression using the best hyperparameters:

poly_features = PolynomialFeatures(degree = best_degree)
X_train_poly = poly_features.fit_transform(X_train)
best_polynomial_regressor = LinearRegression(normalize=best_normalize)
polynomial_regressor.fit(X_train_poly, y_train)

GridsearchCV for Polynomial Regression

2 Answers2