Keras predict_generator outputs a different number of samples

Question

I'm trying to improve the performance of a transfer learning model that uses Xception as the pre-trained model by using data augmentation. The goal is to classify dog breeds. train_tensors and valid_tensors contain the training and testing images respectively in a numpy array.

from keras.applications.xception import Xception 

model = Xception(include_top = False, weights = "imagenet")


datagen = ImageDataGenerator(zoom_range=0.2, 
                             horizontal_flip=True, 
                             width_shift_range = 0.2, 
                             height_shift_range = 0.2,
                             fill_mode = 'nearest',
                             rotation_range = 45)
batch_size = 32

bottleneck_train = model.predict_generator(datagen.flow(train_tensors, 
                                                        train_targets, 
                                                        batch_size = batch_size), 
                                          train_tensors.shape[0]// batch_size)

bottleneck_valid = model.predict_generator(datagen.flow(valid_tensors, 
                                                        valid_targets, 
                                                        batch_size = batch_size), 
                                           test_tensors.shape[0]//batch_size)



print(train_tensors.shape)
print(bottleneck_train.shape)

print(valid_tensors.shape)
print(bottleneck_valid.shape)

However, the output from the last 4 lines is :

(6680, 224, 224, 3)
(6656, 7, 7, 2048)
(835, 224, 224, 3)
(832, 7, 7, 2048)

The predict_generator function is returning a number of samples different than what it provided to it. Are samples being skipped or left out?

I would expect the bottleneck_features to have the same number of samples as the training data. But here, the `train_tensors` has 6680 samples but the `bottleneck_train` has 6656 — Pawan Bhandarkar, Dec 29 '18 at 09:04

score 2 · Accepted Answer · answered Dec 29 '18 at 11:23

Yes, some samples are being left out, this is because 6680 and 835 does not exactly divide by 32 (your batch size), you could adjust the batch size so it divides both numbers exactly.

Or you could just adjust the code to include one additional batch (which will have a slightly smaller size) by using the math.ceil python function:

import math
bottleneck_train = model.predict_generator(datagen.flow(train_tensors, 
                                                    train_targets, 
                                                    batch_size = batch_size), 
                                      math.ceil(train_tensors.shape[0] / batch_size))

bottleneck_valid = model.predict_generator(datagen.flow(valid_tensors, 
                                                    valid_targets, 
                                                    batch_size = batch_size), 
                                       math.ceil(test_tensors.shape[0] /batch_size))

Keras predict_generator outputs a different number of samples

1 Answers1