I am sure here there is someone who has experienced something like this... I have a dataset with 4 classes, one of it is 3% representation and I considered it as under-represented, so I did some resampling approaches, but guess what? its classification accuracy didn't change except by just 1 or 2% at best over the classification accuracy without the resampling approaches! So I wonder if 3% of the class representation among the data could be considered an under-representation or not?
Asked
Active
Viewed 52 times
1 Answers
2
If we have a binary problem, and classes 97% to 3%, then you already get 97% correct by always predicting the first class. So the maximum improvement you can get in accuracy is 3%.
Instead of total accuracy, you should look at per-class accuracy. If the 3% are the important objects (e.g. sales, where you earn money) you may be only interested in that one class.
A simple approach would be the weighted mean accuracy. Where you compute the accuracy of each class, then average. Above "majority classifier" (always predicting the majority label) would then have 50% weighted accuracy (the majority class is always correctly predicted, the minority class is always incorrect).

Has QUIT--Anony-Mousse
- 76,138
- 12
- 138
- 194