I'm trying to compare the performance between multi-class logistic regression(OvR) and Random Forest, but my dataset is unbalanced with 5 possible values for my label. Does the unbalanced data influence on the performance?
Asked
Active
Viewed 43 times
-1
-
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Sep 08 '22 at 13:38
1 Answers
0
Yes (most likely).
Your model gets biased towards the majority class and learns very little about minority classes. If you have an imbalanced data set, first try training on the true distribution. If the model works well and generalizes, you're done! If not, try the following downsampling and upweighting techniques.

s510
- 2,271
- 11
- 18