2

I am using GridSearchCV with a pipeline as follows:

grid = GridSearchCV(
    Pipeline([
        ('reduce_dim', PCA()),
        ('classify', RandomForestClassifier(n_jobs = -1))
        ]),
    param_grid=[
        {
            'reduce_dim__n_components': range(0.7,0.9,0.1),
            'classify__n_estimators': range(10,50,5),
            'classify__max_features': ['auto', 0.2],
            'classify__min_samples_leaf': [40,50,60],
            'classify__criterion': ['gini', 'entropy']
        }
    ],
    cv=5, scoring='f1')

grid.fit(X,y)

How do I now retrieve PCA details like components and explained_variance from the grid.best_estimator_ model?

Furthermore, I also want to save the best_estimator_ to a file using pickle and later load it. How do I retrieve the PCA details from this loaded estimator? I suspect it will be the same as above.

shikhanshu
  • 1,466
  • 2
  • 16
  • 32
  • 1
    I don't get your PCA grid part: `'reduce_dim__n_components': range(0.7,0.9,0.1)` what are the range of values here? – guy Dec 09 '17 at 04:11

1 Answers1

5

grid.best_estimator_ is to access the pipeline with the best parameters.

Now use named_steps[]attribute to access the internal estimators of the pipeline.

So grid.best_estimator_.named_steps['reduce_dim'] will give you the pca object. Now you can simply use this to access the components_ and explained_variance_ attibutes for this pca object like this:

grid.best_estimator_.named_steps['reduce_dim'].components_ grid.best_estimator_.named_steps['reduce_dim'].explained_variance_

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132