Attribute's predictive capacity for a particular target in Python, using feature selection in Sklearn

Question

Are there any feature selection methods in Scikit-Learn (or algos in general) that give weights of an attribute's ability/predictive-capacity/importance to predict a specific target? For example, the from sklearn.datasets import load_iris, ranking each of the 4 attributes weights to predict the 3 iris species separately but for much more complex datasets w/ ~1k-10k attributes.

I'm looking for something analogous to the feature_importances_ from RandomForestClassifier. However, RandomForestClassifer gives weights to each attribute for the entire prediction process. The weights do not need to add up to one but I want to find a way to correlate a specific subset of attributes to a specific target.

First I tried "overfitting" the models to enrich for a specific target but the results didn't seem to change much between targets. Second, I tried going the ordination route by finding which attributes have the greatest variation but that doesn't directly translate to predictive capacity. Third, I tried sparse models but I encountered the same problem as using feature_importances_.

A link to an example or tutorial that does exactly this is sufficient. Possibly a tutorial on how to traverse decision trees in a random forest and store the nodes that are predictive of specific targets.

score 4 · Answer 1 · edited Apr 13 '17 at 12:44

Single targets

Most models are hardly black boxes, so if you are interested in a specific target, you could simply look at the coefficients of the model and do the model calculation by hand to understand how the model came to its output. E.g.:

For a linear model you simply need to multiply with the coefficients and add the bias
For a neural network you need to know all the coefficients and activation functions and do a few calculations to have a look at how the inputs are translated into new 'features' in the hidden layers and then finally outputs
For a random forest you need to look at the decision boundaries of all the trees in the forest
Etc.

Based on such analysis, you could decide what inputs you consider most important.

Sensitivity analysis

More useful, perhaps, would be to look at how the model output changes when your input values change. This will give you a higher-level insight into how important and sensitive the inputs are. This concept is called sensitivity analysis. For most methods, you could simply do some random sampling on the inputs and analyze the outputs.

This can be useful for feature selection, as insensitive inputs are candidates for pruning.

Looking back into the model

Sensitivity analysis is based on the idea of perturbing the input to the model to learn something about how the model comes up with its output. The other way of looking at things would be to take the output and reason backwards into the model and finally the inputs. Such an approach is:

Highly specific to the model technique in question
Complex, since the more non-linear a model is, and the more feature interactions the model has, the harder it is to 'untangle things'.

For a discussion specific to Random Forests, have a look at this Q&A.

Visualization techniques can help. Example from a neural network tool that could give insight: http://playground.tensorflow.org/

General feature importance

For general feature importance, i.e. over all targets, you can look at this part of the scikit-learn documentation.

The example here shows how you can do univariate feature selection with the F-test for feature scoring.

Is this target specific? Or does it apply the prediction process in general? — O.rka, Nov 25 '16 at 18:46
Updated my answer to better address target-specific reasoning. — Def_Os, Nov 28 '16 at 19:02
Thanks for the updates on your answer. The single targets section gives some insight on how it could possibly be done but it is basically saying to build the models from scratch and cataloging the data. The sensitivity analysis and looking back into the model sections seem to require intense combinatorics to try all of the variations and their predictive capacity. The general feature importance is what I was trying to avoid. Although, the f-test is very interesting and I didn't know that exists. Thank you for that. Up-voted. — O.rka, Dec 04 '16 at 19:24

score 2 · Answer 2 · answered Nov 24 '16 at 01:57

I would manually construct separate binary classification models for each of your different possible target values and compare the models. You could possibly normalize the values, however the numerical values themselves are less informative that the ordering of the variables.

Also you might want to look at using a logistic regression model for a different way of calculating your feature importances.

Attribute's predictive capacity for a particular target in Python, using feature selection in Sklearn

2 Answers2