Using smoothed labels from 0 to 1 to train a XGB classifier

Question

I want to train a XGB classifier using smoothed labels between 0 and 1 instead of binary labels.

The native XGB model seems to be able to accept smoothed labels for a binary classifier.

from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb
train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
dtrain = xgb.DMatrix(train_data, label=train_label)

test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
dtest = xgb.DMatrix(test_data, label=test_label)

param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic', 'eval_metric': 'auc'}
evallist = [(dtrain, 'train'), (dtest, 'eval')]

bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)
[0] train-auc:0.68952   eval-auc:0.53327
[1] train-auc:0.74847   eval-auc:0.49597
[2] train-auc:0.79158   eval-auc:0.45795
...

However, when I tried to use the sklearn wrapper XGBClassifier, I got the following error.


model = XGBClassifier(**param)
model.fit(train_data, train_label)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_12603/1675654556.py in <cell line: 1>()
----> 1 model.fit(train_data, train_label)

~/.pyenv/versions/btc-p2p/lib/python3.9/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    618             for k, arg in zip(sig.parameters, args):
    619                 kwargs[k] = arg
--> 620             return func(**kwargs)
    621 
    622         return inner_f

~/.pyenv/versions/btc-p2p/lib/python3.9/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)
   1464                 or not (self.classes_ == expected_classes).all()
   1465             ):
-> 1466                 raise ValueError(
   1467                     f"Invalid classes inferred from unique values of `y`.  "
   1468                     f"Expected: {expected_classes}, got {self.classes_}"

ValueError: Invalid classes inferred from unique values...

I have 2 questions here:

Does the 1st code example actually take the smoothed labels into account during training or it just internally converts the real values to 0 or 1?
Why doesn't the XGBClassifier method work with smoothed labels? Is it possible to get it work?

One solution would be to use XGBRegressor instead - i.e. convert this from a classification problem to a regression problem. Would that be a reasonable solution? — Nick ODell, Jul 12 '23 at 23:25
@NickODell, I thought about that approach. However, our current production work flow doesn't support regression models very well, so I'd like to explore if this would be possible using a classification approach. — Allen Qin, Jul 13 '23 at 23:30

score 1 · Answer 1 · answered Jul 14 '23 at 05:03

1

Answer 1 : In the first code example, train_label and test_label are randomly generated, producing a value between 0 and 1. Hence not smoothened withing the code. XGB internally interpret these labels as 0 and 1 using a sigmoid function.

Answer 2 : XGBClassifier doesn't work with smoothened labels as it expects binary labels for classification tasks.

To convert smoothened labels into binary labels, you can consider pre-processing the labels by using threshold value.

Smoothened to Binary

from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb

train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
train_label_binary = np.where(train_label >= 0.5, 1, 0)  # Apply threshold to convert smoothed labels to binary labels
dtrain = xgb.DMatrix(train_data, label=train_label_binary)

test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
test_label_binary = np.where(test_label >= 0.5, 1, 0)  # Apply threshold to convert smoothed labels to binary labels
dtest = xgb.DMatrix(test_data, label=test_label_binary)

param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic', 'eval_metric': 'auc'}
evallist = [(dtrain, 'train'), (dtest, 'eval')]

bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)

Output:

[0] train-auc:0.80500   eval-auc:0.51000
[1] train-auc:0.93500   eval-auc:0.61500
[2] train-auc:0.95000   eval-auc:0.67500
[3] train-auc:1.00000   eval-auc:0.58000
[4] train-auc:1.00000   eval-auc:0.57500
[5] train-auc:1.00000   eval-auc:0.57500
[6] train-auc:1.00000   eval-auc:0.57500
[7] train-auc:1.00000   eval-auc:0.61500
[8] train-auc:1.00000   eval-auc:0.60000
[9] train-auc:1.00000   eval-auc:0.62000

answered Jul 14 '23 at 05:03

Anay

741
1
5
15

Thanks Anay, for Q1, can you please elaborate on how XGB interpret these labels? I reckon the sigmoid function converts any real number to a range of 0 and 1 instead of converting them to discrete 0s and 1s. Given the input in my example is already between 0 to 1, I'm not sure how that conversion works. My question here is does XGB take these smoothed labels into account during training. In another word, does it consider 0.9 more 'positive' than 0.6 and 0.1 more negative than 0.2? Or does it use a threshold to convert these labels to discrete 0s or 1s. – Allen Qin Jul 14 '23 at 11:31
The real values between 0 and 1 represent the probabilities or confidences of the positive class. Higher values indicate a higher confidence for the positive class, while lower values indicate a higher confidence for the negative class. XGBoost optimizes the model's parameters to maximize the likelihood of the observed smoothed labels given the data. It learns to assign higher probabilities to instances with higher smoothed labels (more positive) and lower probabilities to instances with lower smoothed labels (more negative). It treats the smoothed labels as continuous probabilities – Anay Jul 14 '23 at 12:16
Thanks, that's what I assume XGB does. Do you have a reference or source for this? – Allen Qin Jul 14 '23 at 22:33
Some of the references: 1. https://stackoverflow.com/questions/48011742/xgboost-leaf-scores 2. https://towardsdatascience.com/de-mystifying-xgboost-part-i-f37c5e64ec8e 3. https://github.com/dmlc/xgboost/issues/1763 4. https://stats.stackexchange.com/questions/417806/calibration-curve-of-xgboost-for-binary-classification 5. https://machinelearningmastery.com/xgboost-loss-functions/ 6. https://xgboost.readthedocs.io/en/latest/parameter.html#learning-task-parameters 7. https://datascience.stackexchange.com/questions/61761/how-does-xgboost-use-softmax-as-an-objective-function – Anay Jul 15 '23 at 05:05
Thank you for adding the reference. I guess the particular reference/code I'm after is how a XGB classification model(when the objective function is set to `binary:logistic` ) handles continuous label values between 0 and 1 internally during training. I had a quick scan of the referenced links and couldn't find it. Please let me know if I missed anything. – Allen Qin Jul 17 '23 at 12:17

Using smoothed labels from 0 to 1 to train a XGB classifier

1 Answers1