Dataset has around 150k records with four labels: ['A','B','C','D'] and the distribution is as follows:
A: 60000
B: 50000
C: 36000
D: 4000
I notice using the package classification report to get the precision, recall, and f1-score, the f1-score is causing an UndefinedMetricWarning because class D is not being predicted due to the low number of records.
I know that I need to perform oversample/undersample to fix the imbalanced data.
Question: Would it be a good idea to fix the imbalanced data but randomly sample 4000 records from each class so that it is balanced?