Multiclass classification imbalance

Question

I have 5 different labels, with the followings percentages of frequency:

 '0': 23.21%
 '1': 17.64%
 '2': 29.64%
 '3': 16.96%
 '4': 12.57%

How can I evaluate if this can badly affect my predictions? I have ~1800 records with 28 features each.

I thought about using cross-validation with confusion matrix, but I'm pretty unsure about that

you can use class_weights. – Zabir Al Nazi May 16 '20 at 10:36 — Zabir Al Nazi, May 16 '20 at 10:36

score 0 · Answer 1 · answered May 16 '20 at 10:45

0

You can use a performance measure that takes into account the number of samples for each label, such as the micro- or weighted-averaged F1 score.

answered May 16 '20 at 10:45

Thomas Kok

1 Answers1