2

I use Resnet with Tensorflow to train a model with 20 classes.

My problem is that I have 6-7 classes with A LOT of samples, about the same number of classes with a medium number of samples and the rest of classes with few samples. With this given distribution, my model had a too strong tendency to predict classes with a larger sampling over the smaller one. I've tried to balance my classes by reducing the number of samples of my large classes, and it helped to give a place to the smaller classes during the prediction, but now I've reach a point where I can't improve my model over an accuracy of 90%, and I feel like I'm loosing a lot of valuable information by cutting samples in my large classes.

So, before I go buy more samples, I'm wondering if there is a way to work with unbalanced classes with a logic that model become very good to recognize if the larger classes are present or not (because it has so many samples of them that it is extremely capable to recognize their presence), then, if they are absent, to go check which other classes are present. The idea is to use all the samples I have instead of reducing them.

I've already try the weighted class option in Keras/Tensorflow, it didn't help.

1 Answers1

2

Beside the undersampling technic you used so far, there two other ways to deal with imbalanced data:

  1. class weighting
  2. oversampling

Oversampling is the opposite of what you did ie. you will train every sample of under represneted classes multiple times. class weighting is the case when you tell the model how much it should weigh every class samples in training procedure (weight updates in case of a neural network training). Both these cases are supported by tensorflow and you cand find them in official tutorials.

Mehraban
  • 3,164
  • 4
  • 37
  • 60