Finding label-specific top features for non-linear classifier

Question

Is there any function that gives the top features of each label in a Random Forest/ XG Boost classifier? The classifier.feature_importances_ only gives top features for the classifier as a whole.

Looking for something similar to the classifier.coef_ that gives label-specific top features for SVM and Naive Bayes classifiers in sklearn.

Venkatachalam · Answer 1 · 2019-01-21T07:22:10.257

Firstly, Random Forest / Xgboost or even a simple DecisionTree/ any Tree ensemble is a inherent multi-class classification model. Hence it will predict the multi-class output without using any wrapper ( 1 vs 1 / 1 vs Rest) on top of binary classifier (which is what the logistic regression/SVM/SGDClassifier would do).

Hence, you can get the feature importance for the overall multi-class classification alone and not for individual labels.

If you really want to know the feature importance for individual labels, then use onevsRest wrapper with decisionTree/ RandomForest/ Xgboost as the estimator. This is not the recommended approach because the results could be suboptimal when compared with single decision Tree.

Some examples here.

score 0 · Accepted Answer · answered Jan 22 '19 at 06:49

import pandas as pd
feature_importances = pd.DataFrame(rf.feature_importances_,
                               index = X_train.columns,
                      columns=['importance']).sort_values('importance',ascending=False)

Try with this!

Or 1 vs Rest is also an good option but take lot of time.

Finding label-specific top features for non-linear classifier

2 Answers2