Should the binary class proportions of a training set in a ML model be the same as the test set, especially in a case of extreme class imbalance?

Asked Nov 01 '22 at 19:05

Active Nov 01 '22 at 19:53

Viewed 15 times

If I want to build a model that can predict a class 1 binary outcome that has very low incidence, for example 0.1% of the total in the test set. Would the training set also ideally need to have the same proportions, 1 class 1 to 999 class 0? Or could I train on a class balance of 50% class 1 and 50% class 0, but test on 0.1% class 1 and 99.9% class 0? Thanks in advance.

edited Nov 01 '22 at 19:53

asked Nov 01 '22 at 19:05

jw32022

Should the binary class proportions of a training set in a ML model be the same as the test set, especially in a case of extreme class imbalance?

0 Answers0