0

My dataset has 3 class and 900 examples for training. Class distribution is 220, 185, and 500.

I found that if I oversample the training data then I have to correct/calibrate the predicted probability of the test data because after oversampling the training and testing data distribution are not same. This is nicely described here.

I have three questions:

  1. Do I have to do this also for predicting validation dataset (used for early stopping)?

  2. Do I have to correct the probabilities for loss calculation?

  3. Is this a mandatory step? I am asking this because this might hurt the overall accuracy. Because this will penalize the probabilities of the classes which have less example.

user3363813
  • 567
  • 1
  • 5
  • 19

0 Answers0