3

I am currently learning how to perform data augmentation with Keras ImageDataGenerator from "Deep learning with Keras" by François Chollet.

I now have 1000 (Dogs) & 1000 (Cats) images in training dataset.

I also have 500(Dogs) & 500(Cats) images in validation dataset.

The book defined the batch size as 32 for both training and validation data in the Generator to perform data augmentation with both "step_per_epoch" and "epoch" in fitting the model.

Hpwever, when I train the model, I received the Tensorflow Warning, "Your input ran out of data..." and stopped the training process.

I searched online and many solutions mentioned that the step_per_epoch should be, steps_per_epoch = len(train_dataset) // batch_size & steps_per_epoch = len(validation_dataset) // batch_size

I understand the logic above and there is no warning in the training.

But I am wondering, originally I have 2000 training samples. This is too little so that I need to perform data augmentation to increase the numbers of training images. If the steps_per_epoch = len(train_dataset) // batch_size is applied, since the len(train_dataset) is only 2000. Isn't that I am still using 2000 samples to train the model instead of adding more augmented images to the model?

train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(150, 150),
batch_size=32,
class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=32,
class_mode='binary')

history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=100,
validation_data=validation_generator,
validation_steps=50)
kelvin.aaa2
  • 135
  • 10

1 Answers1

4

The fact that, imagedatagenerator does not increase the size of the training set. All augmentations are done in memory. So an original image is augmented randomly, then its augmented version is returned. If you want to have a look to augmented images you need set these parameters for the function flow_from_directory:

save_to_dir=path,
save_prefix="",
save_format="png",

Now you have 2000 images and with a batch size of 32, you would have 2000 // 32 = 62 steps per epoch, but you are trying to have 100 steps which is causing the error.

If you have a dataset which does not generate batches and want to use all data points, then you should set:

steps_per_epoch = len(train_dataset) // batch_size

But when you use flow_from_directory, it generates batches, so there is no need to set steps_per_epoch unless you want to use less data points than generated batches.

Frightera
  • 4,773
  • 2
  • 13
  • 28
  • Thanks for your ans. "All augmentations are done in memory", The code does not specify how many new images are produced (augmented). So how do I know how many more images are produced for training? Or I don't need to know --> it is all done randomly? – kelvin.aaa2 Jan 11 '21 at 07:28
  • 2
    @kelvin.aaa2 Keras' imagedatagenerator accepts a batch of images used for training. Takes the batch then applies series of random transformations to each image in the batch (whatever you've wanted as augmentation.). Then the key part is: It replaces the original batch with the new, randomly transformed batch. Training is done on this randomly transformed batch. – Frightera Jan 11 '21 at 07:40