Remove middle layers in the pre-trained VGG16 model in Keras

Question

everyone,

I have a question about how to modify the pre-trained VGG16 network in Keras. I try to remove the max-pooling layers at the end the last three convolutional layers and add the batch normalization layer at the end of each convolutional layer. At the same time, I want to keep the parameters. This means that the whole modification process will not only include removing some middle layers, adding some new layers, but also concatenating the modified layers with the rest layers.

I'm still very new in Keras. The only way I can find is as shown in Removing then Inserting a New Middle Layer in a Keras Model

So the codes I edited are as below:

from keras import applications
from keras.models import Model
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers.normalization import BatchNormalization
vgg_model = applications.VGG16(weights='imagenet',
                           include_top=False,
                           input_shape=(160, 80, 3))
# Disassemble layers
layers = [l for l in vgg_model.layers]

# Defining new convolutional layer.
# Important: the number of filters should be the same!
# Note: the receiptive field of two 3x3 convolutions is 5x5.
layer_dict = dict([(layer.name, layer) for layer in vgg_model.layers])
x = layer_dict['block3_conv3'].output

for i in range(11, len(layers)-5):
    # layers[i].trainable = False
    x = layers[i](x)

for j in range(15, len(layers)-1):
    # layers[j].trainable = False
    x = layers[j](x)

x = Conv2D(filters=128, kernel_size=(1, 1))(x)
x = BatchNormalization()(x)
x = Conv2D(filters=128, kernel_size=(1, 1))(x)
x = BatchNormalization()(x)
x = Conv2D(filters=128, kernel_size=(1, 1))(x)
x = BatchNormalization()(x)
x = Flatten()(x)
x = Dense(50, activation='softmax')(x)


custom_model = Model(inputs=vgg_model.input, outputs=x)
for layer in custom_model.layers[:16]:
    layer.trainable = False

custom_model.summary()

However, the output shape of the convolutional layers in block 4 and block 5 are multiple. I tried to correct it by adding a layer MaxPool2D(batch_size=(1,1), stride=none), but the output shape is still multiple. Just like this:

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 160, 80, 3)        0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 160, 80, 64)       1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 160, 80, 64)       36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 80, 40, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 80, 40, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 80, 40, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 40, 20, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 40, 20, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 40, 20, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 40, 20, 256)       590080    
_________________________________________________________________
block4_conv1 (Conv2D)        multiple                  1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        multiple                  2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        multiple                  2359808   
_________________________________________________________________
block5_conv1 (Conv2D)        multiple                  2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        multiple                  2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        multiple                  2359808   
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 40, 20, 128)       65664     
_________________________________________________________________
batch_normalization_1 (Batch (None, 40, 20, 128)       512       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 40, 20, 128)       16512     
_________________________________________________________________
batch_normalization_2 (Batch (None, 40, 20, 128)       512       
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 40, 20, 128)       16512     
_________________________________________________________________
batch_normalization_3 (Batch (None, 40, 20, 128)       512       
_________________________________________________________________
flatten_1 (Flatten)          (None, 102400)            0         
_________________________________________________________________
dense_1 (Dense)              (None, 50)                5120050   
=================================================================
Total params: 19,934,962
Trainable params: 5,219,506
Non-trainable params: 14,715,456
_________________________________________________________________

Can anyone provide some suggestions about how to reach my goal?

Thanks very much.

Did you consider writing a completely new model and importing the pretrained weights? — mpariente, Jul 03 '18 at 17:35
@kosnik, I have re-edited my question, hope this will be clear for you. — LUCY, Jul 03 '18 at 19:24

score 0 · Answer 1 · answered Jul 03 '18 at 21:59

The multiple output shape is there because these layers were called two times so they have two output shapes. You can see here that in case calling layer.output_shape raises an AttributeError, the printed output shape will be 'multiple'.

If you call custom_model.layers[10].output_shape, you will get this error :
AttributeError: The layer "block4_conv1 has multiple inbound nodes, with different output shapes. Hence the notion of "output shape" is ill-defined for the layer. Use `get_output_shape_at(node_index)` instead.

And if you then call custom_model.layers[10].get_output_shape_at(0), you will get the output shape corresponding to the initial network, and for custom_model.layers[10].get_output_shape_at(1), you will get the output shape that you are expecting.

Let me just express that I'm doubting your intention with this modification : if you remove the MaxPooling layer, and that you apply the next layer (number 11) to the output that came before the MaxPooling layer, the learnt filters are "expecting" an image with two times less resolution so they probably won't work.

Let's imagine that one filter is "looking" for eyes and that usually eyes are 10 pixels wide, you'll need an 20 pixels wide eye to trigger the same activation in the layer.
My example is obviously over-simplistic and not accurate but it's just to show that the original idea is wrong, you should either retrain the top of the model / keep the MaxPooling layer/ define a brand new model on the top off layer block3_conv3.

Hi, @mpariente, thanks for your answer, but I am a little confused about your explanation about why I call these layers. How can I find it from the model I constructed? I tried to add the code "custom_model.layers[10].get_output_shape_at(1)" after the code "custom_model = Model(inputs=vgg_model.input, outputs=x)" directly (not sure whether it is the right way), then I obtained this value error: ValueError: Asked to get output shape at node 1, but the layer has only 1 inbound nodes. Then I changed it to "custom_model.layers[10].get_output_shape_at(1)", there is no error, but there are still — LUCY, Jul 04 '18 at 00:33
the same "multiple"s there. Could you please give me more explanations for me? As for the intention of this network, it is from a method in a published paper. Since there are no codes available, I tried to reconstruct it to compare its performance with other methods. The authors deleted the max-pooling layers to increase the size of the output for the sake of more accurate target localizations since the images they use are really small. I am not sure whether this step can really bring benefits to the final results, and I have to implement it to confirm its performance. Thanks again :) — LUCY, Jul 04 '18 at 00:52
Some corrections about my first comment: 1. I am a little confused about your explanation about why I call these layers two times 2.changed it to "custom_model.layers[10].get_output_shape_at(0)" — LUCY, Jul 04 '18 at 00:56

Remove middle layers in the pre-trained VGG16 model in Keras

1 Answers1