2

I am using catboost classifier for my binary classification model where I have a highly imbalance dataset of 0 -> 115000 & 1 -> 10000. Can someone please guide me in how to use the following parameters in catboostclassifier:
1. class_weights
2. scale_pos_weight ?

From the documentation, I am under the impression that I can use Ratio of sum of negative class by sum of positive class i.e. 115000/10000=11.5 as the input for scale_pos_weight but I am not sure .

Please let me know what exact values to use for these two parameters and method to derive that value?

Thanks

user1596433
  • 629
  • 9
  • 17

1 Answers1

3

For scale_pos_weight you would use negative class // positive class. in your case it would be 11 (I prefer to use whole numbers).

For class weight you would provide a tuple of the class imbalance. in your case it would be: class_weights = (1, 11)

class_weights is more flexible so you could define it for multi-class targets. for example if you have 4 classes you can set it: class_weights = (0.5,1,5,25)

and you need to use only one of the parameters. for a binary classification problem I would stick with scale_pos_weight.

Areza
  • 5,623
  • 7
  • 48
  • 79
Andreyn
  • 304
  • 5
  • 14