Feature importance 'gain' in XGBoost for multiclassification tasks

Asked Dec 13 '22 at 16:44

Active Dec 13 '22 at 16:44

Viewed 57 times

I'd like to ask what is the formula for the gain in XGBoost models for multi classification tasks. I know that for regression tasks it's calculated as SIMILARITY_LEFT_CHILD + SIMILARITY_RIGHT_CHILD - SIMILARITY_PARENT and that for binary classification tasks the gain is calculated as ENTROPY_PARENT - AVG.(ENTROPY_CHILDREN).

For tasks of multiple categories the confusion started when I found far less information, and worse - I encountered two different explanations. One explanation suggested using cross-entropy for a similar calculation for a binary classification: https://medium.datadriveninvestor.com/understanding-the-log-loss-function-of-xgboost-8842e99d975d And the other explanation has suggested using a Bayesian Information Criteria https://rpubs.com/mharris/multiclass_xgboost

Is one of the sources correct? Which one?

asked Dec 13 '22 at 16:44

ohadbh

Feature importance 'gain' in XGBoost for multiclassification tasks

0 Answers0