-1

I'm training a sentence-pair binary classification model using Roberta but the model is not able to learn the positive class (class with label 1). My dataset is imbalanced such that:

training data -
0 --- 140623
1 --- 5537

validation data -
0 --- 35156
1 --- 1384

The training results in 0 true positives and 0 false positives on validation data. During the evaluation, I calculate macro F1, but how to take care of class imbalance during training? Several articles mentioned that BERT takes care of imbalance itself. But that doesn't seem to happen in my case.

I am using this dataset.

Any help is appreciated.

MT0
  • 143,790
  • 11
  • 59
  • 117
Sonu Gupta
  • 11
  • 3

1 Answers1

0

If you are using Tensorflow, you can add weights to your data or to your classes. So that you can keep the diversity and balance the loss : https://datascience.stackexchange.com/questions/13490/how-to-set-class-weights-for-imbalanced-classes-in-keras

I am not using pytorch, but I assumed that there are some equivalences.

Clément Perroud
  • 483
  • 3
  • 12