Keras Functional model giving high validation accuracy but incorrect prediction

Question

I am trying to do transfer learning for VGG16 architecture with 'ImageNet' pretrained weights on PASCAL VOC 2012 dataset. PASCAL VOC is a multi label image dataset with 20 classes, and so I have modified the inbuilt VGG16 model like this:

def VGG16_modified():
    base_model = vgg16.VGG16(include_top=True,weights='imagenet',input_shape=(224,224,3))
    print(base_model.summary())
    x = base_model.get_layer('block5_pool').output
    x = (GlobalAveragePooling2D())(x)
    predictions = Dense(20,activation='sigmoid')(x)

    final_model = Model(input = base_model.input, output = predictions)
    print(final_model.summary())
    return final_model

and my input image preprocessing is like this:

img_val = []
for i in tqdm(range(dfval.shape[0])):
        img = image.load_img(train_images+y_val[0][i],target_size=(224,224))
        img = image.img_to_array(img)
        img_val.append(img)
x_val = np.array(img_val

I have converted the categorical labels like this with pd.get_dummies for 20 classes [[0 0 0 0 1 0 0 0 0 1 0 .... ]] and The corresponding labels are of the shape (number of image samples, 20). The input images are of shape (number of image samples, 224,224, 3)

When I trained the model for several epochs, I see very good validation accuracy (around 90%) but when I used the same validation data set to predict the images, it is giving the same class output for every image.

I trained the model like this:

model = VGG16_modified()
model.summary()
model.compile(optimizer=Adam(),loss='binary_crossentropy',metrics = ['accuracy'])
model.fit(x_train, y_train, epochs=100, validation_data=(x_val, yval), batch_size=4)
model.save('CAMVGG16trainall.h5')
model.save_weights('CAMVGG16weightstrainall.h5')

Later I loaded the model and tried to predict the labels for the same validation data set.

model = load_model(model)
preds = model.predict(image)

But I am getting same output for every image. The output is of shape [[0 0 0 ......1 0 0 0...]] I tried with more number of epochs,less number of epochs, by setting a few layers non trainable, by setting all layers trainable,changing the learning rate, using different optimizer (SGD), not using Imagenet weights and training from scratch but none of them are giving me the correct results. Can anyone tell me where have I gone wrong.

How do you define the accuracy? I’m curious about it because it’s multi-label classification problem. I think average precision or recall would make more sense. — zihaozhihao, Oct 04 '19 at 02:31
I highly suspect that your image is not preprocessed as same as training or validating. Make sure the input values are between the same range, such as (0,1) — zihaozhihao, Oct 04 '19 at 02:34
Have you applied any augmentation techniques on training images? — Kaushik Roy, Oct 04 '19 at 02:34
@bit01, no the dataset contain 5717 training images and 5832 validation images. But even without data augmentation, I believe it should atleast perform far better than giving the same class. Also, I don't understand the logic behind high validation accuracy. — Sree, Oct 04 '19 at 02:48
@zihaozhihao, you mean I have to scale the input image to img = img/255 so that all the pixels are within 0 to 1. I have done that as well, but of no use. Also, I have done passing the image through keras.applications.vgg16.preprocess_input(img) but again same issue. — Sree, Oct 04 '19 at 02:50
@zihaozhihao , I followed this link: https://www.analyticsvidhya.com/blog/2019/04/build-first-multi-label-image-classification-model-python/ — Sree, Oct 04 '19 at 02:52
@Sree In your model you are aggressively down-sampling your features by using two consecutive pooling layers, e.g. firstly you get the features from `MaxPooling2D` layer that you feed into `GlobalAveragePooling2D` layer and to me this is the bottleneck. Have you tried using Flatten `MaxPooling2D` layers output instead of `GlobalAveragePooling2D`? — Kaushik Roy, Oct 04 '19 at 03:20
I don't think 90% accuracy is convincing. Because most of values are zero. If the average number of ground truth labels is 3. Then even all predict as zeros, you still have 22/25=0.88 accuracy. — zihaozhihao, Oct 04 '19 at 03:31
@zihaozhihao, Got it, that might me true, so does that mean the model is not learning anything at all? what can I change?? — Sree, Oct 04 '19 at 03:35
@bit01, Well, I want to further get the class activation map for each image and so i have used `GlobalAveragePooling2D`. May be I could remove the previous `MaxPooling2D` — Sree, Oct 04 '19 at 03:37
@Sree try to use recall or precision to monitor the training. — zihaozhihao, Oct 04 '19 at 03:38
The correct image preprocessing may help you. You can try using the method provided in Keras `from keras.applications.vgg16 import preprocess_input` — Shubham Panchal, Oct 04 '19 at 04:10
@zihaozhihao `precision: 0.0000e+00 - recall: 0.0000e+00` this is precision and recall everytime, which clearly shows the model is not learning anything. — Sree, Oct 04 '19 at 04:11
@ShubhamPanchal, I did that as well, after `img_to_array`, i used `preprocess_input` but no results, as i see from precision and recall, the model is not learning anything — Sree, Oct 04 '19 at 04:19
@Sree If you carefully see the class distribution of VOC 2012 dataset then you will find that the classes are not represented equally, means imbalanced. `Person` class is very dominant here. Class frequency: `[ 716 603 811 549 812 467 1284 1128 1366 340 691 1341 526 575 **9583** 613 357 742 589 645]`. Therefore, the model may always predict `Person` class regardless of the sample as the model might have learned that by predicting `Person` class always it achieves high accuracy. To handle imbalanced dataset you might apply weight on classes or under/over sampling. — Kaushik Roy, Oct 04 '19 at 06:31
@bit01, No , I think you deleted your comment. But as you mentioned in that comment, I didn't put layer.trainable=True and that's the issue. I thought layer.trainable is by default set to True. — Sree, Oct 04 '19 at 07:46
good to know that you have found your problem. However, I have explored a bit about PASCAL VOC 2012 dataset and found imbalanced distribution of classes which i shared here. — Kaushik Roy, Oct 04 '19 at 07:57

score 1 · Answer 1 · 2019-12-02T10:12:25.507

Mentioning the Resolution here for the benefit of the community, as there are many comments to know the solution.

Issue here was that the Model was Freezed, i.e., the Layers were not Trained on the PASCAL VOC Dataset.

Weights of the Pre-Trained Model should be Freezed and the Weights of the Layers of the Model Trained on our Dataset shouldn't be.

Issue is resolved by setting, layer.trainable = True. This can be better understood by the screenshot below.

Note:Image is taken from Aurelien Geron's Book on Machine Learning and Deep Learning.

Keras Functional model giving high validation accuracy but incorrect prediction

1 Answers1

Linked