I can see general questions talking about how to use ImageDataGenerator from keras to develop training and validation sets (e.g. here).
I have a CSV file of image names, and a binary label of each image. I want to divide that up into a set of training, validation and test set.
I wrote this:
traindf=pd.read_csv('/content/drive/images_and_labels.txt',dtype=str,sep='\s')
traindf.columns = ['image','label','none1','none2','none3']
traindf.drop(['none1', 'none2','none3'], axis=1)
datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25)
train_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="/content/drive/",
x_col="image",
y_col="label",
subset="training",
batch_size=32,
seed=42,
shuffle=True,
class_mode="categorical",
target_size=(150,150)
)
validation_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="/content/drive/",
x_col="image",
y_col="label",
subset="validation",
batch_size=32,
seed=42,
shuffle=True,
class_mode="categorical",
target_size=(150,150)
)
What I don't understand is how to split that into three data sets, training, test and validation?
When I just try to make another generator called test with:
test_generator=datagen.flow_from_dataframe( dataframe=traindf, directory="/content/drive/", x_col="image", y_col="label", subset="test", batch_size=32, seed=42, shuffle=True, class_mode="categorical", target_size=(150,150) )
I get: ValueError: Invalid subset name: test;expected "training" or "validation"
How do I use ImageDataGenerator to divide a CSV file into three data sets, training, validation and test, so I can build a model with train and validation, and then test on test at the end? I saw here that it was not possible to do, but that question was >3.5 years ago, am wondering if it is now possible?