How to repeat data with flow_from_directory in Keras

Question

I am trying to use keras flow_from_directory to train a model. But it does not repeat data after the epoch(i.e. when all the data has been iterated). I could not find any option to do so either. Below is my code for data generation while training. For example if total images = 70 batch_size = 32 then in 1st and 2nd iteration is gives 32 images, but in third it gives 6 images.

# data generation from directory without labels  
trn = datagen.flow_from_directory(os.path.join(BASE, 'train_gen'),
                                         batch_size=batch_size,
                                         target_size=(inp_shape[:2]),
                                         class_mode=None)
X = trn.next() # getting a batch of data.

I want the data generator to start repeating data after it's exhausted.

Actually I am trying to train a GAN, where a batch images are generated from Generator-Model and then it is concatenated with a batch of real images and then passed to Discriminator-Model and GAN-Model to train. I can't figure out how can I use fit_generator in this, Code is as below:

def train(self, inp_shape, batch_size=1, n_epochs=1000):
    BASE = '/content/gdrive/My Drive/Dataset/GAN'

    datagen = ImageDataGenerator(rescale=1./255)
    trn_dist = datagen.flow_from_directory(os.path.join(BASE, 'train_gen'),
                                                      batch_size=batch_size,
                                                      target_size=(inp_shape[:2]),
                                                      seed = 1360000,
                                                      class_mode=None)

    val_dist = datagen.flow_from_directory(os.path.join(BASE, 'test_gen'),
                                                      batch_size=batch_size,
                                                      target_size=(inp_shape[:2]),
                                                      class_mode=None)

    trn_real = datagen.flow_from_directory(os.path.join(BASE, 'train_real'),
                                                      batch_size=batch_size,
                                                      target_size=(inp_shape[:2]),
                                                      seed = 1360000,
                                                      class_mode=None)

    for e in range(n_epochs):

      real_images = trn_real.next()

      dist_images = trn_dist.next()

      gen_images = self.generator.predict(dist_images)

      factor = inp_shape[0]/250
      gen_res = ndi.zoom(gen_images, (1, factor, factor, 1), order=2)      

      X = np.concatenate([real_images, gen_res])

      y = np.zeros(2*batch_size)
      y[:batch_size] = 1.

      self.discriminator.trainable = True
      self.discriminator.fit(X, y, batch, n_epochs)

      self.discriminator.trainable = False

      self.model.fit(gen_res, y[:batch_size])
      print ('> training --- epoch=%d/%d' % (e, n_epochs))
      if e > 0 and e % 2000 == 0:
        self.model.save('%s/models/gan_model_%d_.h5'%(BASE, e))

PS: I am new to Gans please correct me if I am doing something wrong.

score 4 · Answer 1 · answered Sep 17 '19 at 08:24

To shed some light on the problem, First, you need to know the parameters of flow_from_directory. batch_size determines the number of samples to be loaded for computation and the epoch determines the number of times that you what Keras to pass through all your data. In essence, if you set your epoch=2 and batch_size=32 it means that Keras will go through all your data twice with splitting your data in mini-batches with 32 samples of your data. then what's missing in your code is essentially the epoch parameter. I recommend setting the steps_per_epoch and validation_data as well. the steps_per_epoch determined the number of batches in each epoch than to visit all your samples in each epoch set the steps_per_epoch as follows.

model.fit_generator(train_generator, steps_per_epoch=train_generator.samples/train_generator.batch_size, epochs=10, validation_data=validation_generator, validation_steps=validation_generator.samples/validation_generator.batch_size)

Thanks for this clarification, I am trying to train a GAN, where a batch of image has to be passed to the model. So I am using a loop for n_epochs, rather than using fit_generator. Please find the updated code. — danishansari, Sep 17 '19 at 09:06

score 1 · Answer 2 · answered Sep 17 '19 at 05:23

1

The flow_from_directory method is made to be used with the fit_generator function. The fit_generator function allows you to specify the number of epochs.

model.fit_generator(trn, epochs=epochs)

Where model refers to the model object you want to train. Should solve your problem. These functions are well explained in the Keras documentation

answered Sep 17 '19 at 05:23

JimmyOnThePage

946
8
18

I am trying to train a GAN, where inputs from two sources are concatenated and resized and then passed to a model, So I can work with only batch at a time. So there is no way to set repeat mode in flow_from_directory? – danishansari Sep 17 '19 at 07:59

score 1 · Answer 3 · answered Sep 17 '19 at 07:37

1

You can always specify steps_per_epoch argument in fit_generator method. This will enable you to repeat data when steps_per_epoch > total_samples // batch_size.

answered Sep 17 '19 at 07:37

Srihari Humbarwadi

2,532
1
10
28

score 1 · Answer 4 · answered Mar 13 '20 at 03:15

I found a hack to make the 2nd generator with lesser images to "reset" its index, hence outputting batch of 32 images instead of 6 images as you mentioned earlier.

Looking at your code, I supposed trn_real is the generator with more images and trn_dist is the generator with lesser images. At each iteration, compare the batch shape, if they are not equal (meaning the generator reaches the end of the index, hence output lesser images), then reset the generator as follows:

real_images = trn_real.next()
dist_images = trn_dist.next()
if real_images.shape != dist_images.shape:
    trn_dist.reset() # reset the generator with lesser images
    dist_images = trn_dist.next()

How to repeat data with flow_from_directory in Keras

4 Answers4