The VGG16 architecture has input: 224x224x3 images.I want to have 48x48x3 inputs but to do this in keras, we remove the last fc layers which have 4096 neurons each.Why we have to do this? and is it needed to add another size of fc layers for this input?
2 Answers
Final pooling layer of VGG16 has dimension 7x7x512
for 224x224
input image. From there VGG16 uses fully connected layer of (7x7x512)x4096
to get 4096
dimensional output. However, since your input size is different your feature output dimension from final pooling layer will also be different (2x2x512
I think). So you need to change matrix dimension for fully connected layer to make it work. You have two other options though
- use a global average pooling across spatial dimension to get
512
dimensional feature and then use few fully connected layers to get to your number of classes. - Resize you input image to
224x224x3
and you won't need to change anything in model architecture.

- 3,400
- 2
- 17
- 30
Removing the last FC layers is for fine-tuning or transfer learning, where you adapt an existing network to a new problem, such as changing the number of categories that your classifier can choose between.
You are adapting the network to take a different sized input, so you need to adjust the first layer(s) of the network.

- 1,584
- 1
- 13
- 21