Wondering which percentage below it we can say the class is not balanced?

Question

I am sure here there is someone who has experienced something like this... I have a dataset with 4 classes, one of it is 3% representation and I considered it as under-represented, so I did some resampling approaches, but guess what? its classification accuracy didn't change except by just 1 or 2% at best over the classification accuracy without the resampling approaches! So I wonder if 3% of the class representation among the data could be considered an under-representation or not?

score 2 · Answer 1 · answered Nov 04 '16 at 07:12

If we have a binary problem, and classes 97% to 3%, then you already get 97% correct by always predicting the first class. So the maximum improvement you can get in accuracy is 3%.

Instead of total accuracy, you should look at per-class accuracy. If the 3% are the important objects (e.g. sales, where you earn money) you may be only interested in that one class.

A simple approach would be the weighted mean accuracy. Where you compute the accuracy of each class, then average. Above "majority classifier" (always predicting the majority label) would then have 50% weighted accuracy (the majority class is always correctly predicted, the minority class is always incorrect).

Wondering which percentage below it we can say the class is not balanced?

1 Answers1