I accidently found that I get a better fit for my binary dataset if I set parameter objective of XGBClassifier
to multiclass and increase the number of classes. At the same time, however, the fit takes longer and consumes twice as much memory.
Unfortunately, I could not create a toy example for this behaviour. But in the example below I still see different log-losses for different num_class
parameter values:
X = DataFrame([[0.5, 0.2, 0.1], [0.3, 0.4, 0.1], [0.4, 0.1, 0.5], [0.8, 0.4, 0.4]])
y = Series([0, 0, 1, 1])
regressor = xgb.XGBClassifier(subsample=1, n_estimators=2, max_depth=4, objective="multi:softprob", num_class=4)
regressor.fit(X, y)
num_class=2
results in a log-loss of 0.644, for num_class=3
I get 0.741 , for num_class=10
I get 1.126.
I suppose it has something to do with some early-stop criterion or some learning rate adaption? Any ideas?