Imbalanced learning problem - out of sample vs validation

Question

I am training on three classes with one dominant majority class of about 80% and the other two even. I am able to train a model using undersampling / oversampling techniques to get validation accuracy of 67% which would already be quite good for my purposes. The issue is that this performance is only present on the balanced validation data, once I test on out of sample with imbalanced data it seems to have picked up a bias towards even class predictions. I have also tried using weighted loss functions but also no joy on out of sample. Is there a good way to ensure the validation performance translates over? I have tried using auroc to validate the model successfully but again the strong performance is only present in the balanced validation data.

Methods of resampling I have tried: SMOTE oversampling and random undersampling.

score 0 · Answer 1 · answered Jul 24 '19 at 11:49

If I understood correctly, may be you are looking for performance measurement and better classification results on imbalance datasets.

Alone measuring the performance using accuracy in case of imbalanced datasets usually high and misleading and minority class could be totally ignored Instead use f1-score, precision/recall score.

For my project work on imbalanced datasets, I have used SMOTE sampling methods along with the K-Fold cross validation.

Cross validation technique assures that model gets the correct patterns from the data, and it is not getting up too much noise.

References : What is the correct procedure to split the Data sets for classification problem?

Imbalanced learning problem - out of sample vs validation

1 Answers1