Imbalanced text classification by oversampling: correction class probability

Asked Apr 25 '20 at 21:16

Active Apr 25 '20 at 21:16

Viewed 94 times

My dataset has 3 class and 900 examples for training. Class distribution is 220, 185, and 500.

I found that if I oversample the training data then I have to correct/calibrate the predicted probability of the test data because after oversampling the training and testing data distribution are not same. This is nicely described here.

I have three questions:

Do I have to do this also for predicting validation dataset (used for early stopping)?
Do I have to correct the probabilities for loss calculation?
Is this a mandatory step? I am asking this because this might hurt the overall accuracy. Because this will penalize the probabilities of the classes which have less example.

asked Apr 25 '20 at 21:16

user3363813

cross-platform question: https://datascience.stackexchange.com/q/72993/27665 – OmG Apr 25 '20 at 22:16
the post on ds.se was deleted, but it really belongs either there or at stats.se, and the question there was more fleshed-out... – Ben Reiniger May 01 '20 at 02:17
now at https://datascience.stackexchange.com/q/73077/55122 – Ben Reiniger May 17 '20 at 01:30

Imbalanced text classification by oversampling: correction class probability

0 Answers0