I am trying to build a simple 5-class object detector by extracting the bottleneck features using a pre-trained vgg16 (trained on image net). I have 10000 images for training - 2000 for each class AND 2500 for testing - 500 for each class. However, once I extract the bottleneck features, the validation tensor has a size of 2496, but the expected size should be 2500. I have checked the folder data folder and found that the total number of validation images is 2500. But I am still getting an error when I try to execute my code. The error says - "ValueError: Input arrays should have the same number of samples as target arrays. Found 2496 input samples and 2500 target samples". I have attached the code below, can anyone make me understand why the number of input samples is getting reduced to 2496?
I just checked the number of images present in the train and test data to be really sure that no images were missing. It turns out that no images were actually missing.
This is the code to get the bottleneck features.
global_start=dt.now()
#Dimensions of our flicker images is 256 X 256
img_width, img_height = 256, 256
#Declaration of parameters needed for training and validation
train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
epochs = 50
batch_size = 16
#Get the bottleneck features by Weights.T * Xi
def save_bottlebeck_features():
datagen = ImageDataGenerator(rescale=1./255)
#Load the pre trained VGG16 model from Keras, we will initialize only the convolution layers and ignore the top layers.
model = applications.VGG16(include_top=False, weights='imagenet')
generator_tr = datagen.flow_from_directory(train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None, #class_mode=None means the generator won't load the class labels.
shuffle=False) #We won't shuffle the data, because we want the class labels to stay in order.
nb_train_samples = len(generator_tr.filenames) #10000. 2000 training samples for each class
bottleneck_features_train = model.predict_generator(generator_tr, nb_train_samples // batch_size)
np.save('weights/vgg16bottleneck_features_train.npy',bottleneck_features_train) #bottleneck_features_train is a numpy array
generator_ts = datagen.flow_from_directory(validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_validation_samples = len(generator_ts.filenames) #2500. 500 training samples for each class
bottleneck_features_validation = model.predict_generator(generator_ts, nb_validation_samples // batch_size)
np.save('weights/vgg16bottleneck_features_validation.npy',bottleneck_features_validation)
print("Got the bottleneck features in time: ",dt.now()-global_start)
num_classes = len(generator_tr.class_indices)
return nb_train_samples,nb_validation_samples,num_classes,generator_tr,generator_ts
nb_train_samples,nb_validation_samples,num_classes,generator_tr,generator_ts=save_bottlebeck_features()
This is the output of the above code snippet:
Found 10000 images belonging to 5 classes.
Found 2500 images belonging to 5 classes.
Got the bottleneck features in time: 1:56:44.166846
Now, if I do validation_data.shape
, I am getting (2496, 8, 8, 512) whereas the expected output should be (2500, 8, 8, 512). The train_data output is fine. What might be wrong? I am new to debugging in Keras and I am not really able to figure out what exactly is causing this problem.
Any help would be highly appreciated!