-1

I have 5 different labels, with the followings percentages of frequency:

 '0': 23.21%
 '1': 17.64%
 '2': 29.64%
 '3': 16.96%
 '4': 12.57%

How can I evaluate if this can badly affect my predictions? I have ~1800 records with 28 features each.

I thought about using cross-validation with confusion matrix, but I'm pretty unsure about that

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

0

You can use a performance measure that takes into account the number of samples for each label, such as the micro- or weighted-averaged F1 score.

Thomas Kok
  • 81
  • 3