1

I would like to understand how the "Gain" column is calculated with xgb.importance function in R and if there exist some scientific articles about that with convergence results for example (working well with correlated features, non linear link, ...).

My objective is to compare this approach to the SHAP one or other estimation methods to interpreate features impact on output with xgboost models. In my context the output is a count poisson.

Lolivano
  • 152
  • 6

1 Answers1

0

Not at all a complete answer, just a starter:

It calculates

  • the number of splits on feature j or
  • the gain in total loss from splits on feature j. (You choose).

As such, the inverse link is simply part of the predict function used to find the total loss gain. Regarding correlated features: you will see their importance as used by the model (the model is never refitted without feature j). This can or cannot be what users are interested in.

Michael M
  • 880
  • 7
  • 10