I have Images and Labels, and I want to divide them into training and validation sets. I have the below code for the same. train_test_split
guarantees that both sets have the same proportion of data.
from sklearn.model_selection import train_test_split
val_split = 0.25
X_train, X_val, y_train, y_val = train_test_split(train_images, train_labels, test_size=val_split, stratify=train_labels)
X = np.concatenate((X_train, X_val))
y = np.concatenate((y_train, y_val))
y = to_categorical(y)
Then I use Keras Datagenerator
datagen = ImageDataGenerator(val_split)
training_generator = datagen.flow(X, y, batch_size=64,subset='training',seed=7)
validation_generator = datagen.flow(X, y, batch_size=64,subset='validation',seed=7)
But I encounter the below error
ValueError: Training and validation subsets have different number of classes after the split. If your numpy arrays are sorted by the label, you might want to shuffle them.
Where am I going wrong?