0

I am working on ML regression problem where I defined a pipeline like below based on a tutorial online.

My code looks like below

pipe1 = Pipeline([('poly', PolynomialFeatures()),
                 ('fit', linear_model.LinearRegression())])
pipe2 = Pipeline([('poly', PolynomialFeatures()),
                 ('fit', linear_model.Lasso())])
pipe3 = Pipeline([('poly', PolynomialFeatures()),
                 ('fit', linear_model.Ridge())])
pipe4 = Pipeline([('poly', PolynomialFeatures()),
                 ('fit', linear_model.TweedieRegressor())])


models3 = {'OLS': pipe1,
           'Lasso': GridSearchCV(pipe2, 
                                 param_grid=lasso_params).fit(X_train,y_train).best_estimator_ ,
           'Ridge': GridSearchCV(pipe3, 
                                 param_grid=ridge_params).fit(X_train,y_train).best_estimator_,
           'Tweedie':GridSearchCV(pipe4, 
                                 param_grid=tweedie_params).fit(X_train,y_train).best_estimator_}
test(models3, df)

While the above code worked fine and gave me the results, how can I get the list of polynomial features that were created?

Or how can I view them in the dataframe?

Alexander L. Hayes
  • 3,892
  • 4
  • 13
  • 34
The Great
  • 7,215
  • 7
  • 40
  • 128

1 Answers1

1

You can use the transform method to generate the polynomial feature matrix. To do so, you'll first have to access the corresponding step in the pipeline which, in this case, is at the 0th index. Here is how you can get the polynomial features array for pipe2:

feature_matrix = model3['Lasso'][0].transform(X_train)

Furthermore, if you wish to generate a DataFrame with the feature names, you can do so by using the get_feature_names_out method:

feature_names = model['Lasso'][0].get_feature_names_out()
feature_df = pd.DataFrame(feature_matrix, columns=feature_names)
A.T.B
  • 625
  • 6
  • 16
  • nice.So, this gives us all the polynomial features which were created during the pipeline? Will try and update soon – The Great Nov 26 '22 at 04:02
  • yes, you can read more about it in the scikit-learn documentation: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html – A.T.B Dec 01 '22 at 09:06