XGboost python - classifier class weight option?

Question

Is there a way to set different class weights for xgboost classifier? For example in sklearn RandomForestClassifier this is done by the "class_weight" parameter.

NOTE: All of the solutions below no longer work as sample_weight is not supported anymore. — SriK, Aug 05 '20 at 14:18
scale_pos_weight is the right parameter. Look at my answer below. — SriK, Aug 05 '20 at 14:44
@SriK yep, but it only works for binary classification problems — onofricamila, Aug 06 '20 at 18:29
@SriK I am not quite a STAFF/Senior in machine learning, but based on what I see in the scikit-learn version of XGBoost, we do have sample-weight available, and it just worked fantastically well for my research on rare diseases a few minutes ago. https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn — Simon Provost, Nov 13 '21 at 14:45

Firas Omrane · Answer 1 · 2021-05-17T08:40:55.587

16

For sklearn version < 0.19

Just assign each entry of your train data its class weight. First get the class weights with class_weight.compute_class_weight of sklearn then assign each row of the train data its appropriate weight.

I assume here that the train data has the column class containing the class number. I assumed also that there are nb_classes that are from 1 to nb_classes.

from sklearn.utils import class_weight
classes_weights = list(class_weight.compute_class_weight('balanced',
                                             np.unique(train_df['class']),
                                             train_df['class']))

weights = np.ones(y_train.shape[0], dtype = 'float')
for i, val in enumerate(y_train):
    weights[i] = classes_weights[val-1]

xgb_classifier.fit(X, y, sample_weight=weights)

Update for sklearn version >= 0.19

There is simpler solution

from sklearn.utils import class_weight
classes_weights = class_weight.compute_sample_weight(
    class_weight='balanced',
    y=train_df['class']
)

xgb_classifier.fit(X, y, sample_weight=classes_weights)

edited May 17 '21 at 08:40

answered Sep 13 '19 at 02:28

Firas Omrane

894
1
14
21

1

I get your answer and it works fine but how can you do this when you have to use **Pipeline**? You can't use the `fit` method directly. – Deshwal Nov 11 '21 at 08:33
@Deshwal As this is a different type of inquiry and I would not like to delve into a response that is unrelated to the original, here's a decent article discussing such a thing: https://towardsdatascience.com/pipelines-custom-transformers-in-scikit-learn-the-step-by-step-guide-with-python-code-4a7d9b068156 – Simon Provost Nov 13 '21 at 14:46
For using `sample_weight` with `Pipeline`, there is an example here: https://stackoverflow.com/a/36224909/407108 – Justas Mar 13 '23 at 17:32

score 12 · Accepted Answer · answered Feb 28 '17 at 13:45

12

when using the sklearn wrapper, there is a parameter for weight.

example:

import xgboost as xgb
exgb_classifier = xgboost.XGBClassifier()
exgb_classifier.fit(X, y, sample_weight=sample_weights_data)

where the parameter shld be array like, length N, equal to the target length

answered Feb 28 '17 at 13:45

epattaro

2,330
1
16
29

How can you use it with **Pipeline**? As you can't use `fit` directly inside a pipeline – Deshwal Nov 11 '21 at 08:32
@Deshwal As this is a different type of inquiry and I would not like to delve into a response that is unrelated to the original, here's a decent article discussing such a thing: https://towardsdatascience.com/pipelines-custom-transformers-in-scikit-learn-the-step-by-step-guide-with-python-code-4a7d9b068156 – Simon Provost Nov 13 '21 at 14:43

score 7 · Answer 3 · answered Jun 28 '18 at 01:04

I recently ran into this problem, so thought will leave a solution I tried

from xgboost import XGBClassifier

# manually handling imbalance. Below is same as computing float(18501)/392318 
on the trainig dataset.
# We are going to inversely assign the weights
weight_ratio = float(len(y_train[y_train == 0]))/float(len(y_train[y_train == 
1]))
w_array = np.array([1]*y_train.shape[0])
w_array[y_train==1] = weight_ratio
w_array[y_train==0] = 1- weight_ratio

xgc = XGBClassifier()
xgc.fit(x_df_i_p_filtered, y_train, sample_weight=w_array)

Not sure, why but the results were pretty disappointing. Hope this helps someone.

[Reference link] https://www.programcreek.com/python/example/99824/xgboost.XGBClassifier

Should be w1 = np.array([1.0] * y_train.shape[0]) , initializing the numpy array's dtype as a float. Otherwise the following statements will result in a numpy array containing all zeros. — Diego Amicabile, Dec 23 '18 at 18:49

score 3 · Answer 4 · answered May 29 '20 at 08:42

3

from sklearn.utils.class_weight import compute_sample_weight
xgb_classifier.fit(X, y, sample_weight=compute_sample_weight("balanced", y))

answered May 29 '20 at 08:42

Tianhuang Su

41
1

2

Please add some explanation to your answer before getting to the code implementation. – Ofek Hod May 29 '20 at 12:08

score 3 · Answer 5 · answered Aug 05 '20 at 14:43

3

The answers here are outdated. THe sample_weight parameter is no longer supported. Its replaced with scale_pos_weight

Rather just do scale_pos_weight = sum(negative instances) / sum(positive instances)

answered Aug 05 '20 at 14:43

SriK

1,011
1
15
29

2

yep, but this only works for binary classification problems – onofricamila Aug 06 '20 at 18:37

score 0 · Answer 6 · answered May 19 '20 at 02:50

0

You can alternatively use the scale_pos_weight hyperparameter, as discussed in the XGBoost docs. The advantage of this approach is that you don't have to construct the sample weight vector, and don't have to pass in the sample weight vector at fit time.

answered May 19 '20 at 02:50

skeller88

4,276
1
32
34

interesting. I tried with my problem and my question is how different is this method from sample_weight in the fit method? if you have an insight about this, it would be amazing. – Simon Provost Nov 13 '21 at 13:44

score 0 · Answer 7 · answered Feb 07 '21 at 12:44

0

Similar to @Firas Omrane and @Pramit answer, but I think it is slightly more pythonic


    from sklearn.utils import class_weight
    class_weights = dict(
            zip(
                [0,1],
                class_weight.compute_class_weight(
                    'balanced', classes=np.unique(train['class']), y=train['class']
                ),
            )
        ) 
    
    xgb_classifier.fit(X, train['class'], sample_weight=class_weights)

answered Feb 07 '21 at 12:44

skibee

1,279
1
17
37

The format of this `class_weights` is not the expected by `xgb`. Could you elaborate if there is anything extra that needs to be done to make it work? Thanks – juanbretti May 17 '21 at 08:17
1

@juanbretti Using Skibee's response will not work with Scikit-xgboost learn's implementation since they require a list similar to your class target (i.e., the same size) but with the weight value for this i th instead of 1, 0 or whatever the unique values in your column are. Thus, this answer is ideal for logging, for example, what class weights should be applied to your unique values. However, I would recommend using class weight.compute sample weight when using it with XGBoost Scikit Learn implementation. Do you understand? or still confused? – Simon Provost Nov 13 '21 at 14:25

XGboost python - classifier class weight option?

7 Answers7