Why is VGG16 giving lesser no. of total parameters?

Question

from tensorflow.keras.applications import VGG16

pre_trained_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

model = Sequential()

model.add(pre_trained_model)

model.add(GlobalAveragePooling2D())

model.add(Flatten())

model.add(Dense(512, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

model.summary()

The total no. of parameters in VGG16 is 138 million. However, on checking the no. of parameters, it gives 14,977,857 only. Can anyone explain why is there a difference in the no. of total parameters. Even if I check the total no. of parameters in pre_trained_model, it is also not equal to 138 million.

score 4 · Answer 1 · edited Apr 29 '22 at 08:51

4

You have include_top=False parameter set which drops top FC layers of VGG16. If you set include_top=True and check pre_trained_model.summary(), you will see these lines at the bottom:

flatten (Flatten)           (None, 25088)             0         
                                                                 
 fc1 (Dense)                 (None, 4096)              102764544 
                                                                 
 fc2 (Dense)                 (None, 4096)              16781312  
                                                                 
 predictions (Dense)         (None, 1000)              4097000   
                                                                 
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544

And now you have the desired 138M parameters.

The lesson learned here: the majority of the parameters in this NN actually comes from FC layers. By the way, this fact once again demonstrates the lightness of convolutional layers in comparison with FC ones.

edited Apr 29 '22 at 08:51

desertnaut

57,590
26
140
166

answered Apr 29 '22 at 07:56

Evgeny Kovalev

381
5
9

Thanks a lot for helping out. My model gives good accuracy with include_top = false. Can you please explain when we should prefer include_top = false and when it should be true? – nikita shah Apr 30 '22 at 08:13
That is, what happens when we drop the top fc layers of inception? I am working on violence detection using CNN – nikita shah Apr 30 '22 at 08:21
CNNs for classification generally consist of a feature extractor (Conv layers part) and a classifier (FC layers part). Setting `include_top=False` drops classifier. This way, you can add your own FC layers and train them to solve your specific classification problem, while maintaining the pre-trained feature extractor. In other words, use `include_top=False` to fine-tune CNN. On the contrary, I suppose that you want to use `include_top=True` when you need to obtain the original NN architecture, or want to solve a task similar to the one it was pre-trained on. – Evgeny Kovalev May 04 '22 at 08:16
Personally, I think that this stack of three FC layers on the top of VGG16 is one of the weak points of this CNN. After discussing your question, we might notice that the majority of parameters of VGG16 comes from classifier (ridiculous 89%!). Three FC layers just seem to be too much for this architecture. In comparison, ResNet50 architecture which is [stronger and lighter](https://keras.io/api/applications/) contains only one FC layer in the end with ~2M parameters (8% of the whole network). – Evgeny Kovalev May 04 '22 at 08:28

Why is VGG16 giving lesser no. of total parameters?

1 Answers1