Obtaining the most informative features after dimensionality reduction

Question

I basically have a python script that tries a variety of dimensionality reduction techniques combined with a variety of classifiers. I attempted to collect the most informative features for each classifier:

if 'forest' in type(classifier).__name__.lower():
            importances = classifier.feature_importances_
            coefs_with_fns = numpy.argsort(importances)[::-1]
        else:
            coefs_with_fns = sorted(zip(classifier.coef_, reduced_training.columns))

While this works in principal, the output is just a series of integers, which (i assume) correspond to the column numbers in the feature array before the classifier. Which brings me to the problem: This array is the direct result of one dimensionality reduction method, which throws away all the previously attached column labels.

So my question is: is there a way to trace back the result of the dimensionality reduction to the actual columns/labels in the original dataset?

score 1 · Answer 1 · edited May 23 '17 at 12:25

You can't.

When you do dimensionality reduction (like PCA), what you get is some new vectors and not a subset of the original feature set. And in the process you lose information. You project the the features of the original feature set, from a high dimensional space to a new (lower) space. You can't go back.

Note that Dimensionality Reduction or Feature Extraction is different from Feature Selection. In Feature Selection you select a subset of the original feature set.

Edit: in case you decide to use a Feature Selection technique, look at this answer in order to do what you want.

Obtaining the most informative features after dimensionality reduction

1 Answers1