ValueError: when using sklearn's train_test_split

Question

I have Images and Labels, and I want to divide them into training and validation sets. I have the below code for the same. train_test_split guarantees that both sets have the same proportion of data.

from sklearn.model_selection import train_test_split

val_split = 0.25
X_train, X_val, y_train, y_val = train_test_split(train_images, train_labels, test_size=val_split, stratify=train_labels)
X = np.concatenate((X_train, X_val))
y = np.concatenate((y_train, y_val))
y = to_categorical(y)

Then I use Keras Datagenerator

datagen = ImageDataGenerator(val_split)
training_generator = datagen.flow(X, y, batch_size=64,subset='training',seed=7)
validation_generator = datagen.flow(X, y, batch_size=64,subset='validation',seed=7)

But I encounter the below error

ValueError: Training and validation subsets have different number of classes after the split. If your numpy arrays are sorted by the label, you might want to shuffle them.

Where am I going wrong?

Keras is warning you, that you have classes in your validation set which are not present in your training set. Try adding `shuffle=True` to `train_test_split` — drops, Aug 10 '20 at 09:27
Adding `shuffle=True` didn't help. And I tried to print the unique values in `y_train` and `y_val`. And these have 4 unique values as expected. I don't understand why I still get the same error. — iamkk, Aug 10 '20 at 09:46
How do you set the `validation_split` argument of `ImageDataGenerator`? The `ImageDataGenerator(val_split)` is not correct; you should explicitly assign it to its key. Further, if you have split the data using `train_test_split` in a stratified way, then why don't you use that split instead? i.e. why do you want the split to be done by `ImageDataGenerator`? — today, Aug 10 '20 at 22:23
@today, Yes, I used the split from the `train_test_split` and it did solve the problem. Anyways I had other issues with the data, so I was able to figure them as well. Thank you. — iamkk, Aug 12 '20 at 10:39

ValueError: when using sklearn's train_test_split

0 Answers0