3

I was wondering if the fit_generator() in keras has any advantage in respect to memory usage over using the usual fit() method with the same batch_size as the generator yields. I've seen some examples similar to this:

def generator():
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# some data prep
...
while 1:
    for i in range(1875): # 1875 * 32 = 60000 -> # of training samples
        yield X_train[i*32:(i+1)*32], y_train[i*32:(i+1)*32]

If I pass this into the fit_generator() method or just pass all the data directly into the fit() method and define a batch_size of 32, would it make any difference regarding (GPU?)-memory whatsoever?

V1nc3nt
  • 369
  • 4
  • 11

1 Answers1

3

Yes the difference actually comes in when you need augmented data for better model accuracy.

For efficiency it allows realtime data augmentation on images with CPU. That means it can use the GPU for your model training and it updates, while delegating to the CPU the load of augmenting images and providing the batches to train.

petezurich
  • 9,280
  • 9
  • 43
  • 57
kishore V M
  • 818
  • 6
  • 13
  • But on the GPU the memory usage would still be the same, is that correct? It always has to load the full batch_size into the GPU memory with fit() and with fit_generator()? – V1nc3nt Jul 25 '17 at 10:02
  • 1
    Yes you are correct .what I think is in case if you have to introduce augmentation and use keras ImageDataGenerator as generator which provides many features like rotations, zoom, shear etc which will create a new images from the existing training data and will not be available upfront and this additional processing can be taken care in parallel. take a look at https://keras.io/preprocessing/image/ for ImageDataGenerator – kishore V M Jul 25 '17 at 10:42
  • Thanks for your help. I was hoping for a method that allowed me to load only parts of one batch into the model in order to save GPU memory. But it seems you always have to provide the full batch at least. E.g. if I have a batch_size of 10, I would like to have a way of only providing 5 samples and then another 5 separately before the backprop is done. Maybe there is some restriction to why this can't work. – V1nc3nt Jul 25 '17 at 10:52