I am dealing with unbalanced data and trying to improve my model by using stratified data. The problem is that I am unsure how to do so exactly. Everything I have tried so far doesn't change anything.
It should be something like this:
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.2, shuffle = True, random_state = 0, stratify = y_train)
but it doesn't matter if I pass the "stratify" parameter or not. My Data is OneHot encoded and y_train looks like this:
[[1. 0.] [1. 0.] [0. 1.] ... [0. 1.] [0. 1.] [1. 0.]]
As far as I understand stratify needs my two classes but I am unsure how to do that.
EDIT: It doesn't matter if I set stratify = y_train or not because the dimensions of y_train doesn't change.
Thanks!