20

I have a highly unbalanced dataset and am wondering where to account for the weights, and thus am trying to comprehend the difference between scale_pos_weight argument in XGBClassifier and the sample_weight parameter of the fit method. Would appreciate an intuitive explanation of the difference between the two, if they can be used simultaneously or how either approach is selected.

The documentation indicates that scale_pos_weight:

control the balance of positive and negative weights..& typical value to consider: sum(negative cases) / sum(positive cases)

Example:

from xgboost import XGBClassifier
import xgboost as xgb
LR=0.1
NumTrees=1000
xgbmodel=XGBClassifier(booster='gbtree',seed=0,nthread=-1,
                       gamma=0,scale_pos_weight=14,learning_rate=LR,n_estimators=NumTrees,
                      max_depth=5,objective='binary:logistic',subsample=1)
xgbmodel.fit(X_train, y_train)

OR

from xgboost import XGBClassifier
import xgboost as xgb
LR=0.1
NumTrees=1000
xgbmodel=XGBClassifier(booster='gbtree',seed=0,nthread=-1,
                       gamma=0,learning_rate=LR,n_estimators=NumTrees,
                      max_depth=5,objective='binary:logistic',subsample=1)
xgbmodel.fit(X_train, y_train,sample_weight=weights_train)
mamafoku
  • 1,049
  • 2
  • 14
  • 28

1 Answers1

30

The sample_weight parameter allows you to specify a different weight for each training example. The scale_pos_weight parameter lets you provide a weight for an entire class of examples ("positive" class).

These correspond to two different approaches to cost-sensitive learning. If you believe that the cost of misclassifying positive examples (missing a cancer patient) is the same for all positive examples (but more than misclassifying negative ones, e.g. telling someone they have cancer when they actually don't) then you can specify one single weight for all positive examples via scale_pos_weight.

XGBoost treats labels = 1 as the "positive" class. This is evident from the following piece of code:

if (info.labels[i] == 1.0f) w *= param_.scale_pos_weight

See this question.

The other scenario is where you have example-dependent costs. One example is detecting fraudulent transactions. Not only a false negative (missing a fraudulent transaction) is more costly than a false positive (blocking a legal transaction), but the cost of missing a false negative is proportional to the amount of money being stolen. So you want to give larger weights to positive (fraudulent) examples with higher amounts. In this case, you can use the sample_weight parameter to specify example-specific weights.

Milad Shahidi
  • 627
  • 7
  • 13
  • Hi, Would you be able to tell me how do you calculate values for the both instances? Also what is the positive instance and negative instance in `scale_pos_weight` Thanks – Alain Michael Janith Schroter Apr 15 '19 at 06:55
  • 4
    There is no standard way to "calculate" values for these weights. When weighing the entire positive class, XGBoost documentation suggests sum(negative instances) / sum(positive instances) as "a typical value to consider". This is in principle a hyperparameter to tune. For weighing individual instances, it is totally up to you to decide what the cost of misclassification is for those instances. For examples in detecting fraud in credit card transactions, you could say that the cost of missing a fraudulent transaction is proportional to the amount of money that was stolen – Milad Shahidi Apr 15 '19 at 18:07
  • Thanks a lot. Also, would you be able to tell me what does it means positive instance it is where the class value is = to 1 and negative instance is where, the class value is = 0?. But the assigning of 0 and 1 to classes could be arbitrary right? Thanks – Alain Michael Janith Schroter Apr 16 '19 at 11:01
  • 2
    You're right. In the case of XGBoost, y=1 is treated as the positive class. I updated the answer and included this. – Milad Shahidi Apr 16 '19 at 21:47
  • Can scale_pos_weight and sample_weight be used simultaneously? Are there any traps here? – Glue Mar 14 '23 at 10:43