0

I use a feature selection in combination with a pipeline in SciKit-Learn. As a feature selection strategy I use SelectKBest.

The pipeline is created and executed like this:

select = SelectKBest(k=5)
clf = SVC(decision_function_shape='ovo')
    parameters = dict(feature_selection__k=[1,2,3,4,5,6,7,8], 
              svc__C=[0.01, 0.1, 1],
              svc__decision_function_shape=['ovo'])
steps = [('feature_selection', select),
                 ('svc', clf)]
pipeline = sklearn.pipeline.Pipeline(steps)
cv = sklearn.grid_search.GridSearchCV(pipeline, param_grid=parameters)
cv.fit( features_training, labels_training )

I know that I can get the best-parameters afterwards via cv.best_params_. However, this only tells me that a k=4 is optimal. But I would like to know which features are these? How can this be done?

beta
  • 5,324
  • 15
  • 57
  • 99

1 Answers1

2

For your example, you can get the scores of all your features using cv.best_estimator_.named_steps['feature_selection'].scores_. This will give you the scores for all of your features and using them you should be able to see which were the chosen features. Similarly, you can also get the pvalues by cv.best_estimator_.named_steps['feature_selection'].pvalues_.

EDIT

A better way to get this would be to use the get_support method of the SelectKBest class. This will give a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. This will be as follows:

cv.best_estimator_.named_steps['feature_selection'].get_support()

Abhinav Arora
  • 3,281
  • 18
  • 20
  • nice. so if the result of `.scores_` is, for instance, `[ 891.65675063 952.43574853 739.36567492 913.33581205 753.59383098 910.65470991 867.7711945 469.26835899]` and I see from the `best_params_` that `k=4`, then I can assume, that the 4 features with the highest values got selected? is this correct? – beta Jul 26 '16 at 18:11
  • I think that is correct. Please check my latest edit to the answer. That is the best way to see the chosen features. – Abhinav Arora Jul 26 '16 at 18:34