1

I am following the official Tensorflow/Keras docs on image classification, in particular the section on image augmentation. There it says:

Data augmentation takes the approach of generating additional training data from your existing examples by augmenting then using random transformations that yield believable-looking images. This helps expose the model to more aspects of the data and generalize better.

So my understanding of this is that - for example if I have not many training images - I want to generate additional training data by creating new, augmented images in addition to the existing training images.

Then in the Keras docs linked above it is shown how some preprocessing layers from the layers.experimental.preprocessing module are being added as first layers to the Sequential model of the example. So in theory that makes sense, those new preprocessing layers augment the input data (=images) before the "enter" the real TF model.

However, as quoted above and what I thought we want to do is to create additional images, i.e. create new, more images for the existing training images. But how would such a set of preprocessing layers in the model create additional images? Wouldn't they simple (randomly) augment the existing training images before the enter the model, but not create new, additional images?

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
Matthias
  • 9,817
  • 14
  • 66
  • 125
  • I have the exact same question. Did you find any answer? – Tae-Sung Shin Nov 04 '22 at 23:42
  • 1
    @Tae-SungShin Nope, we switched to PyTorch quite a while ago ... – Matthias Nov 05 '22 at 15:12
  • Thanks. I think I found the answer. Based on training log, the augmentation layers do not produce additional images but randomly transform the original images. In this case, user has to provide multiple copies of original images as input to the model to increase generated data amount. – Tae-Sung Shin Nov 05 '22 at 15:51

1 Answers1

0

It is creating additional images, but that doesn't necessarily mean that it will create new jpg files.

If this is what you're trying to do, ImageDataGenerator can do that, with the save_to_dir argument.

Wouldn't they simple (randomly) augment the existing training images before the enter the model, but not create new, additional images?

Yes, it creates new images. But it doesn't create new files on your machine. You can use this:

ImageDataGenerator.flow_from_directory(directory, target_size=(256, 256), save_to_dir=None, save_prefix='', save_format='png' )

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
  • I don't need to save the newly augmented images to disk, but I do want to tell the system how many additional training images to create before/while training the model. Let's say I have two image classes, one with 400 images, the other with 650 images. Now I want to train the model with 1000 images per class. In both cases (either with the preprocessing layers or with the `ImageDataGenerator`), how can I set **how many** additoonal images should be created? – Matthias Oct 16 '20 at 13:03
  • That's not really what you asked in the post, but for that you can just iterate over `ImageDataGenerator` and save the desired number of images. Then you have complete control over how much of what should be used. Or use a customized `tf.data.Dataset` with [transformations](https://stackoverflow.com/questions/64374691/apply-different-data-augmentation-to-part-of-the-train-set-based-on-the-category/64375641#64375641). – Nicolas Gervais Oct 16 '20 at 13:14
  • Well, I thought that I have asked exactly that because I wrote that "I want to generate **additional** training data by creating new, augmented images **in addition** to the existing training images." Anyways, thanks for your answer, I will check out the approach described in your last comment. – Matthias Oct 16 '20 at 13:21