I am trying to do features selection as a part of the a scikit-learn pipeline, on a multi-label scenario. My purpose is to select best K features, for some given k.
It might be simple, but I don't understand how to get the selected features indices in such a scenario.
on a regular scenario I could do something like that:
anova_filter = SelectKBest(f_classif, k=10)
anove_filter.fit_transform(data.X, data.Y)
anova_filter.get_support()
but on a multilabel scenario my labels dimensions are #samples X #unique_labels so fit and fit_transform yield the following exception: ValueError: bad input shape
which makes sense, because it expects labels of dimension [#samples]
on the multilabel scenario, it makes sense to do something like that:
clf = Pipeline([('f_classif', SelectKBest(f_classif, k=10)),('svm', LinearSVC())])
multiclf = OneVsRestClassifier(clf, n_jobs=-1)
multiclf.fit(data.X, data.Y)
but then the object I'm getting is of type sklearn.multiclass.OneVsRestClassifier which doesn't have a get_support function. How do I get the trained SelectKBest model when it's used during a pipeline?