I have 5 different labels, with the followings percentages of frequency:
'0': 23.21%
'1': 17.64%
'2': 29.64%
'3': 16.96%
'4': 12.57%
How can I evaluate if this can badly affect my predictions? I have ~1800 records with 28 features each.
I thought about using cross-validation with confusion matrix, but I'm pretty unsure about that