I want to train a XGB classifier using smoothed labels between 0 and 1 instead of binary labels.
The native XGB model seems to be able to accept smoothed labels for a binary classifier.
from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb
train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
dtrain = xgb.DMatrix(train_data, label=train_label)
test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
dtest = xgb.DMatrix(test_data, label=test_label)
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic', 'eval_metric': 'auc'}
evallist = [(dtrain, 'train'), (dtest, 'eval')]
bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)
[0] train-auc:0.68952 eval-auc:0.53327
[1] train-auc:0.74847 eval-auc:0.49597
[2] train-auc:0.79158 eval-auc:0.45795
...
However, when I tried to use the sklearn wrapper XGBClassifier, I got the following error.
model = XGBClassifier(**param)
model.fit(train_data, train_label)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_12603/1675654556.py in <cell line: 1>()
----> 1 model.fit(train_data, train_label)
~/.pyenv/versions/btc-p2p/lib/python3.9/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
618 for k, arg in zip(sig.parameters, args):
619 kwargs[k] = arg
--> 620 return func(**kwargs)
621
622 return inner_f
~/.pyenv/versions/btc-p2p/lib/python3.9/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)
1464 or not (self.classes_ == expected_classes).all()
1465 ):
-> 1466 raise ValueError(
1467 f"Invalid classes inferred from unique values of `y`. "
1468 f"Expected: {expected_classes}, got {self.classes_}"
ValueError: Invalid classes inferred from unique values...
I have 2 questions here:
- Does the 1st code example actually take the smoothed labels into account during training or it just internally converts the real values to 0 or 1?
- Why doesn't the XGBClassifier method work with smoothed labels? Is it possible to get it work?