1

I can see general questions talking about how to use ImageDataGenerator from keras to develop training and validation sets (e.g. here).

I have a CSV file of image names, and a binary label of each image. I want to divide that up into a set of training, validation and test set.

I wrote this:

traindf=pd.read_csv('/content/drive/images_and_labels.txt',dtype=str,sep='\s')
traindf.columns = ['image','label','none1','none2','none3']
traindf.drop(['none1', 'none2','none3'], axis=1)

datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25)

train_generator=datagen.flow_from_dataframe(
  dataframe=traindf,
  directory="/content/drive/",
  x_col="image",
  y_col="label",
  subset="training",
  batch_size=32,
  seed=42,
  shuffle=True,
  class_mode="categorical",
  target_size=(150,150)
)

validation_generator=datagen.flow_from_dataframe(
  dataframe=traindf,
  directory="/content/drive/",
  x_col="image",
  y_col="label",
  subset="validation",
  batch_size=32,
  seed=42,
  shuffle=True,
  class_mode="categorical",
  target_size=(150,150)
)

What I don't understand is how to split that into three data sets, training, test and validation?

When I just try to make another generator called test with:

test_generator=datagen.flow_from_dataframe( dataframe=traindf, directory="/content/drive/", x_col="image", y_col="label", subset="test", batch_size=32, seed=42, shuffle=True, class_mode="categorical", target_size=(150,150) )

I get: ValueError: Invalid subset name: test;expected "training" or "validation"

How do I use ImageDataGenerator to divide a CSV file into three data sets, training, validation and test, so I can build a model with train and validation, and then test on test at the end? I saw here that it was not possible to do, but that question was >3.5 years ago, am wondering if it is now possible?

Slowat_Kela
  • 1,377
  • 2
  • 22
  • 60

1 Answers1

0

What I would recommend doing is just create a separate dataframe for your test data. You could even automatically split it using another script. Then you can just load that new dataframe as your test data:

test_generator = datagen.flow_from_dataframe(
test_df,
directory=image_dir,
x_col='filename',
y_col='labels',
class_mode='categorical',
shuffle=False,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=1) 
brad
  • 930
  • 9
  • 22