XGBClassifier from xgboost gives different fits for different num_class parameters

Question

I accidently found that I get a better fit for my binary dataset if I set parameter objective of XGBClassifier to multiclass and increase the number of classes. At the same time, however, the fit takes longer and consumes twice as much memory.

Unfortunately, I could not create a toy example for this behaviour. But in the example below I still see different log-losses for different num_class parameter values:

X = DataFrame([[0.5, 0.2, 0.1], [0.3, 0.4, 0.1], [0.4, 0.1, 0.5], [0.8, 0.4, 0.4]])
y = Series([0, 0, 1, 1])

regressor = xgb.XGBClassifier(subsample=1, n_estimators=2, max_depth=4, objective="multi:softprob", num_class=4)

regressor.fit(X, y)

num_class=2 results in a log-loss of 0.644, for num_class=3 I get 0.741 , for num_class=10 I get 1.126.

I suppose it has something to do with some early-stop criterion or some learning rate adaption? Any ideas?

score 0 · Answer 1 · answered Feb 23 '20 at 22:12

Typically, the lower the logloss score the better - whereas it seems you have interpreted a higher logloss score to be better.

The plot below shows the Log Loss contribution from a single positive instance where the predicted probability ranges from 0 (the completely wrong prediction) to 1 (the correct prediction). It’s apparent from the gentle downward slope towards the right that the Log Loss gradually declines as the predicted probability improves. Moving in the opposite direction though, the Log Loss ramps up very rapidly as the predicted probability approaches 0.

log loss to probability estimate for correct class

So for your case, having num_class = 2 will return the 'best' logloss score as this is the true number of classes. I recommend this thread for further reading on log loss scores - https://stats.stackexchange.com/questions/276067/whats-considered-a-good-log-loss

Thanks for your answer. In my actual real world data, the log loss got really better with in an increasing number of classes, it decreased. In my toy example, I saw it the other way around, the log loss increased in this case. So I did not misinterpreted it. The actual question is why do I get different log losses by simply changing the number of classes?! — user1488793, Feb 26 '20 at 21:14

XGBClassifier from xgboost gives different fits for different num_class parameters

1 Answers1