-1

I basically have a python script that tries a variety of dimensionality reduction techniques combined with a variety of classifiers. I attempted to collect the most informative features for each classifier:

if 'forest' in type(classifier).__name__.lower():
            importances = classifier.feature_importances_
            coefs_with_fns = numpy.argsort(importances)[::-1]
        else:
            coefs_with_fns = sorted(zip(classifier.coef_, reduced_training.columns))

While this works in principal, the output is just a series of integers, which (i assume) correspond to the column numbers in the feature array before the classifier. Which brings me to the problem: This array is the direct result of one dimensionality reduction method, which throws away all the previously attached column labels.

So my question is: is there a way to trace back the result of the dimensionality reduction to the actual columns/labels in the original dataset?

Max Uppenkamp
  • 974
  • 4
  • 16

1 Answers1

1

You can't.

When you do dimensionality reduction (like PCA), what you get is some new vectors and not a subset of the original feature set. And in the process you lose information. You project the the features of the original feature set, from a high dimensional space to a new (lower) space. You can't go back.

Note that Dimensionality Reduction or Feature Extraction is different from Feature Selection. In Feature Selection you select a subset of the original feature set.

Edit: in case you decide to use a Feature Selection technique, look at this answer in order to do what you want.

Community
  • 1
  • 1
Christos Baziotis
  • 5,845
  • 16
  • 59
  • 80