I am trying to understand how multiclass classification in XGBoost works. I have read the Paper by Chen and Guestrin (2016, https://arxiv.org/abs/1603.02754), but the details are still not clear to me:
Say I want to produce a probabilistic classifier for a 3-category classification task. If I understood correctly, XGBoost fits regression trees as "weak learners" or components of the boosting model. Therefore, if a new predictor vector is passed to the XGB model, the regression trees produce a real value as "prediction", the (weighted) combination of which is the boosted model prediciton.
From this question and the documentation in the paper, I gathered that a softmax activation function is applied to the boosted model prediction (a real value?), and that the tree structure (e.g. splitting points) are determined by optimizing the cross-entropy loss function after the softmax is applied to the model output.
What is not clear to me is how exactly three class probabilities are obtained. If the model output is just a real value (a weighted combination of the individual regression trees' outputs), how can an application of softmax function return 3 probabilities?
I am using the XGBoost library in both Python and R, but that probably makes no difference.