Is there a way to set different class weights for xgboost classifier? For example in sklearn RandomForestClassifier this is done by the "class_weight" parameter.
-
1NOTE: All of the solutions below no longer work as sample_weight is not supported anymore. – SriK Aug 05 '20 at 14:18
-
1scale_pos_weight is the right parameter. Look at my answer below. – SriK Aug 05 '20 at 14:44
-
@SriK yep, but it only works for binary classification problems – onofricamila Aug 06 '20 at 18:29
-
@SriK I am not quite a STAFF/Senior in machine learning, but based on what I see in the scikit-learn version of XGBoost, we do have sample-weight available, and it just worked fantastically well for my research on rare diseases a few minutes ago. https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn – Simon Provost Nov 13 '21 at 14:45
7 Answers
For sklearn version < 0.19
Just assign each entry of your train data its class weight. First get the class weights with class_weight.compute_class_weight
of sklearn then assign each row of the train data its appropriate weight.
I assume here that the train data has the column class
containing the class number. I assumed also that there are nb_classes
that are from 1 to nb_classes
.
from sklearn.utils import class_weight
classes_weights = list(class_weight.compute_class_weight('balanced',
np.unique(train_df['class']),
train_df['class']))
weights = np.ones(y_train.shape[0], dtype = 'float')
for i, val in enumerate(y_train):
weights[i] = classes_weights[val-1]
xgb_classifier.fit(X, y, sample_weight=weights)
Update for sklearn version >= 0.19
There is simpler solution
from sklearn.utils import class_weight
classes_weights = class_weight.compute_sample_weight(
class_weight='balanced',
y=train_df['class']
)
xgb_classifier.fit(X, y, sample_weight=classes_weights)

- 894
- 1
- 14
- 21
-
1I get your answer and it works fine but how can you do this when you have to use **Pipeline**? You can't use the `fit` method directly. – Deshwal Nov 11 '21 at 08:33
-
@Deshwal As this is a different type of inquiry and I would not like to delve into a response that is unrelated to the original, here's a decent article discussing such a thing: https://towardsdatascience.com/pipelines-custom-transformers-in-scikit-learn-the-step-by-step-guide-with-python-code-4a7d9b068156 – Simon Provost Nov 13 '21 at 14:46
-
For using `sample_weight` with `Pipeline`, there is an example here: https://stackoverflow.com/a/36224909/407108 – Justas Mar 13 '23 at 17:32
when using the sklearn wrapper, there is a parameter for weight.
example:
import xgboost as xgb
exgb_classifier = xgboost.XGBClassifier()
exgb_classifier.fit(X, y, sample_weight=sample_weights_data)
where the parameter shld be array like, length N, equal to the target length

- 2,330
- 1
- 16
- 29
-
How can you use it with **Pipeline**? As you can't use `fit` directly inside a pipeline – Deshwal Nov 11 '21 at 08:32
-
@Deshwal As this is a different type of inquiry and I would not like to delve into a response that is unrelated to the original, here's a decent article discussing such a thing: https://towardsdatascience.com/pipelines-custom-transformers-in-scikit-learn-the-step-by-step-guide-with-python-code-4a7d9b068156 – Simon Provost Nov 13 '21 at 14:43
I recently ran into this problem, so thought will leave a solution I tried
from xgboost import XGBClassifier
# manually handling imbalance. Below is same as computing float(18501)/392318
on the trainig dataset.
# We are going to inversely assign the weights
weight_ratio = float(len(y_train[y_train == 0]))/float(len(y_train[y_train ==
1]))
w_array = np.array([1]*y_train.shape[0])
w_array[y_train==1] = weight_ratio
w_array[y_train==0] = 1- weight_ratio
xgc = XGBClassifier()
xgc.fit(x_df_i_p_filtered, y_train, sample_weight=w_array)
Not sure, why but the results were pretty disappointing. Hope this helps someone.
[Reference link] https://www.programcreek.com/python/example/99824/xgboost.XGBClassifier

- 1,373
- 1
- 18
- 27
-
2Should be w1 = np.array([1.0] * y_train.shape[0]) , initializing the numpy array's dtype as a float. Otherwise the following statements will result in a numpy array containing all zeros. – Diego Amicabile Dec 23 '18 at 18:49
from sklearn.utils.class_weight import compute_sample_weight
xgb_classifier.fit(X, y, sample_weight=compute_sample_weight("balanced", y))

- 41
- 1
-
2Please add some explanation to your answer before getting to the code implementation. – Ofek Hod May 29 '20 at 12:08
The answers here are outdated. THe sample_weight parameter is no longer supported. Its replaced with scale_pos_weight
Rather just do scale_pos_weight = sum(negative instances) / sum(positive instances)

- 1,011
- 1
- 15
- 29
You can alternatively use the scale_pos_weight
hyperparameter, as discussed in the XGBoost docs. The advantage of this approach is that you don't have to construct the sample weight vector, and don't have to pass in the sample weight vector at fit
time.

- 4,276
- 1
- 32
- 34
-
interesting. I tried with my problem and my question is how different is this method from sample_weight in the fit method? if you have an insight about this, it would be amazing. – Simon Provost Nov 13 '21 at 13:44
Similar to @Firas Omrane and @Pramit answer, but I think it is slightly more pythonic
from sklearn.utils import class_weight
class_weights = dict(
zip(
[0,1],
class_weight.compute_class_weight(
'balanced', classes=np.unique(train['class']), y=train['class']
),
)
)
xgb_classifier.fit(X, train['class'], sample_weight=class_weights)

- 1,279
- 1
- 17
- 37
-
The format of this `class_weights` is not the expected by `xgb`. Could you elaborate if there is anything extra that needs to be done to make it work? Thanks – juanbretti May 17 '21 at 08:17
-
1@juanbretti Using Skibee's response will not work with Scikit-xgboost learn's implementation since they require a list similar to your class target (i.e., the same size) but with the weight value for this i th instead of 1, 0 or whatever the unique values in your column are. Thus, this answer is ideal for logging, for example, what class weights should be applied to your unique values. However, I would recommend using class weight.compute sample weight when using it with XGBoost Scikit Learn implementation. Do you understand? or still confused? – Simon Provost Nov 13 '21 at 14:25