Evaluating Model Outcome on Test Set After Downsampling Training Data because of Class Imbalance

Question

I'm working with an extremely class imbalanced data set (the % of positive classes is ~0.1%) and have explored a number of different sampling techniques to help improve the model performance (measured by AUPRC). Since I only have a few thousand positive class examples and several million negative classes, I have mostly explored downsampling. In general, I have found that this approach has resulted in almost no discernible model improvement when evaluated on an unbalanced test set that reflects the true distribution of classes.

However, as an experiment I tried downsampling both the training and test sets, and have found an order of magnitude (10x) increase in performance. This finding has held true for both XGBoost and a simple Fully Connected MLP model.

This to me suggests that the model can in fact distinguish the classes, but I cannot figure out how to adjust the model when trained on a more balanced training set to have a similar performance gain when evaluated on the unbalanced training set. Any suggestions?

We **don't** downsample (or oversample) the *test* set: [Balance classes in cross validation](https://stackoverflow.com/questions/48805063/balance-classes-in-cross-validation/48810493#48810493), [Process for oversampling data for imbalanced binary classification](https://stackoverflow.com/questions/51064462/process-for-oversampling-data-for-imbalanced-binary-classification/51082870#51082870). Never. — desertnaut, Jun 12 '20 at 16:40
Yes of course. My point is that the model performs quite well when both the training and test set are balanced, but quite poorly when the training set is balanced but the test set is unbalanced. — shadowprice, Jun 12 '20 at 20:01
That's my point, too; it doesn't make sense to calculate any performance metric in a **balanced** test set! That's the only difference in these two settings you describe - balancing the test set does not of course have any direct impact on the training procedure. — desertnaut, Jun 12 '20 at 21:46
In other words, your "quite good" performance when you also balance the test set is actually meaningless and invalid. — desertnaut, Jun 12 '20 at 21:54

Evaluating Model Outcome on Test Set After Downsampling Training Data because of Class Imbalance

0 Answers0