sklearn - How to retrieve PCA components and explained variance from inside a Pipeline passed to GridSearchCV

Question

I am using GridSearchCV with a pipeline as follows:

grid = GridSearchCV(
    Pipeline([
        ('reduce_dim', PCA()),
        ('classify', RandomForestClassifier(n_jobs = -1))
        ]),
    param_grid=[
        {
            'reduce_dim__n_components': range(0.7,0.9,0.1),
            'classify__n_estimators': range(10,50,5),
            'classify__max_features': ['auto', 0.2],
            'classify__min_samples_leaf': [40,50,60],
            'classify__criterion': ['gini', 'entropy']
        }
    ],
    cv=5, scoring='f1')

grid.fit(X,y)

How do I now retrieve PCA details like components and explained_variance from the grid.best_estimator_ model?

Furthermore, I also want to save the best_estimator_ to a file using pickle and later load it. How do I retrieve the PCA details from this loaded estimator? I suspect it will be the same as above.

I don't get your PCA grid part: `'reduce_dim__n_components': range(0.7,0.9,0.1)` what are the range of values here? — guy, Dec 09 '17 at 04:11

score 5 · Accepted Answer · answered Oct 18 '17 at 01:47

grid.best_estimator_ is to access the pipeline with the best parameters.

Now use named_steps[]attribute to access the internal estimators of the pipeline.

So grid.best_estimator_.named_steps['reduce_dim'] will give you the pca object. Now you can simply use this to access the components_ and explained_variance_ attibutes for this pca object like this:

grid.best_estimator_.named_steps['reduce_dim'].components_ grid.best_estimator_.named_steps['reduce_dim'].explained_variance_

This is perfect. Thanks a bunch! – shikhanshu Oct 18 '17 at 02:23 — shikhanshu, Oct 18 '17 at 02:23

sklearn - How to retrieve PCA components and explained variance from inside a Pipeline passed to GridSearchCV

1 Answers1

Linked