2

I'm working on a binary classification dataset and applying xgBoost model to the problem. Once the model is ready, I plot the feature importance and one of the trees resulting from the underlying random forests. Please find these plots below.

enter image description here enter image description here

Questions

  • If I take a test set of say 10 datapoints, would the importance of features vary from datapoint to datapoint for computation of that datapoints predict_proba score?
  • Taking analogy from CNNs class activation map which varies from datapoint to datapoint, does the ordering and relative importance of each feature remain the same when model runs on multiple datapoints or does it vary?
Piyush Makhija
  • 304
  • 2
  • 11
  • 2
    The importance of xgboost is computed based on training data. The alternative methods for computing xgboost feature importance: https://mljar.com/blog/feature-importance-xgboost/ The feature importance will be unchanged for any `test` data. It is property of trained model. If you would like to get decision plots for your test data, please take a look at SHAP package or MLJAR AutoML https://github.com/mljar/mljar-supervised (it computes permutation-based importance and SHAP decision plots for best/worst predictions). – pplonski Feb 15 '21 at 10:35
  • Thanks @pplonski I did compute model's feature importance using xgBoost package and shap library. But what I'm looking for is decision plots for test data i.e. why a test datapoint was classified as class 0 or 1. Will check out MLJAR AutoML package ... – Piyush Makhija Feb 16 '21 at 05:37

1 Answers1

1

What do you mean by "datapoint"? Is a datapoint a single case/subject/patient/etc? If so;

  1. The feature importance plot and the tree you plotted both relate only to the model, they are independent of the test set. Finding out which features were important in categorising a specific subject/case/datapoint in the test set is a more challenging task (see e.g. XGBoostExplainer / https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211).

  2. The ordering and relative importance of each feature are different for each subject/case/datapoint (see above), and there is no 'class activation map' in xgboost - all data is analysed and data that is deemed 'not important' does not contribute final decision.

EDIT

Further example of XGBoostExplainer: example_1.png

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • Yes, by datapoint I do mean a single instance in test set. Will try out the xgBoostExplainer to see it is able to help me understand how and why the mdoel scored a single datapoint as positive or negative label. Intention is to identify which feature contributed the most or was the biggest deciding factor for final classification output. – Piyush Makhija Feb 16 '21 at 05:41
  • 1
    Yes, in that case it sounds like xgboostexplainer will help you achieve that - I've updated my answer to include an example showing how model predictions are calculated for two subjects from a test set and how all the features ultimately combine to get the final prediction, although the importance for the different features changes for the different subjects – jared_mamrot Feb 16 '21 at 06:09