0

I am working with an imbalanced dataset where I have a class variable of 2 different values: 0 and 1.

The number of '0' values is 1000 and the number of '1' values is 3000.

For XGBClassifier, LGBMClassifier and CatBoostClassifier I found that there is a parameter called "scale_pos_weight" which enables to modify the weights of the class values:

scale_pos_weight = number_of_negative_values / number_of_positive_values

My question is: how can we know which value of class variable is positive and which negative?

jartymcfly
  • 1,945
  • 9
  • 30
  • 51

2 Answers2

0

jartymcfly. Usually, positive = 1 and negative = 0.

scale_pos_weight = len(y[y == 0]) / len(y[y == 1])
Alex Ivanov
  • 657
  • 1
  • 8
  • 17
  • According to this post (https://machinelearningmastery.com/xgboost-for-imbalanced-classification/) on machinelearningmastery: "For an imbalanced binary classification dataset, the negative class refers to the majority class (class 0) and the positive class refers to the minority class (class 1)." But I have not tried it myself though – joostblack Dec 15 '20 at 10:54
0

For binary classification imbalanced dataset, always consider positive value to the minority class (class 1) and negative values to the majority class (class 0).

But you have assumed class 0 as minority class & class 1 as majority class.

By default value of scale_pos_weight=1 or > 1

Vit
  • 111
  • 3