2

Is there any function that gives the top features of each label in a Random Forest/ XG Boost classifier? The classifier.feature_importances_ only gives top features for the classifier as a whole.

Looking for something similar to the classifier.coef_ that gives label-specific top features for SVM and Naive Bayes classifiers in sklearn.

Parvathy Sarat
  • 395
  • 1
  • 6
  • 22

2 Answers2

0

Firstly, Random Forest / Xgboost or even a simple DecisionTree/ any Tree ensemble is a inherent multi-class classification model. Hence it will predict the multi-class output without using any wrapper ( 1 vs 1 / 1 vs Rest) on top of binary classifier (which is what the logistic regression/SVM/SGDClassifier would do).

Hence, you can get the feature importance for the overall multi-class classification alone and not for individual labels.

If you really want to know the feature importance for individual labels, then use onevsRest wrapper with decisionTree/ RandomForest/ Xgboost as the estimator. This is not the recommended approach because the results could be suboptimal when compared with single decision Tree.

Some examples here.

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
0
import pandas as pd
feature_importances = pd.DataFrame(rf.feature_importances_,
                               index = X_train.columns,
                      columns=['importance']).sort_values('importance',ascending=False)

Try with this!

Or 1 vs Rest is also an good option but take lot of time.