I know that we need to have balanced data in y
to have a better model. However, I'm wondering whether we need to have balanced data in independent variable as well.
In the following dataframe, X3
is a category type independent variable.
X1 X2 X3 y
22 67 1 0
33 87 1 0
55 66 1 0
77 12 1 0
28 68 1 1
12 64 2 0
19 17 2 1
10 62 2 1
88 19 2 1
99 20 2 1
While the data in y
is balanced (1:1 distribution), X3
has imbalanced data in each category (4:1 distribution).
Do I need to have equal distribution in X3 as well?