0

i'm working on my thesis and i used a Catboost classifier to perform a binary analysis on a very unbalanced dataset:

  • class0 = x number of samples
  • class1 = 10*x number of samples

In order to optimize the performance of the model i changed the weights of the classes, giving an higher weight to the minority class, and then i performed a grid search cross validation in which it is searched the set of hyperparameters that reduces the crossentropy loss associated to the catboost model.

At this point i also changed the classificaiton threshold by maximizing the G-mean metric (sqruare root of sensitivity multiplied by specificity).

In you opinion, if you are experts or informed about ensemble methods of type boosting, is it right to procede in this way to increase the performance of the model when the dataset is unbalanced? Maybe it would be enough just to change the weights and use the grid search instead of changing also the classification threshold?

Thank you in advance!

0 Answers0