Replacing a Keras layer in a pretrained model with another layer

Question

I am using Keras with Tensorflow version 2.7 as backend. I am referring to the stackoverflow post at Removing then Inserting a New Middle Layer in a Keras Model. I aim to instantiate an Imagenet-pretrained VGG16 model and replace every MaxPooling2D layer by the AveragePooling2D layer:

import os
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras.layers import *
from tensorflow.keras import applications
from tensorflow.keras.models import Model
model_input = (224,224,3)
model = applications.VGG16(include_top=False,
                           weights='imagenet',
                           input_shape=model_input)
model.summary()

for layer in tuple(model.layers):
    layer_type = type(layer).__name__
    if layer.__name__ == 'MaxPooling2D':
        pool_name = layer.name + "_averagepooling2d"
        pool = AveragePooling2D() if layer_type == "MaxPooling2D" else pool(name=pool_name)
        model.add(pool)        

model.summary()

I get the following error:

  File "C:\Users\AppData\Local\Temp\2/ipykernel_26864/1200445239.py", line 15, in <module>
    if layer.__name__ == 'MaxPooling2D':

AttributeError: 'InputLayer' object has no attribute '__name__'

Also, I am not sure if this is the right way to replace the MaxPooling layers with the AveragePooling layer in all types of pretrained models including those with skip connections and dense blocks. Requesting code correction in this regard.

Take a look at https://stackoverflow.com/a/45309508/15239951 — Corralien, Jan 22 '23 at 15:25

learner · Accepted Answer · 2023-01-22T18:06:30.040

Here is one of the approaches to accomplish that:

Original vgg16

import tensorflow as tf
vgg=tf.keras.applications.vgg16.VGG16(
    include_top=True,
    weights='imagenet',
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation='softmax'
)
vgg.summary()

Output:

Modified vgg16

#One of the following two:
#model_input=vgg.input
model_input=tf.keras.Input(shape=(224, 224, 3,))

x=model_input
for layer in vgg.layers[1:]:
    if isinstance(layer, tf.keras.layers.MaxPooling2D):
        kwargs=layer.get_config()
        x=tf.keras.layers.AveragePooling2D(**kwargs)(x)
    else:
        x=layer(x)
model=tf.keras.Model(inputs=model_input, outputs=x, name="vgg_avg")
model.summary()

Output:

Notes

By replacing the MaxPooling2D layers with their AveragePooling2D counterparts, the originally-optimized weights may not be optimal anymore. So, some level of tuning (with small learning rate) might be needed.

score 0 · Answer 2 · answered Jan 22 '23 at 16:15

Keras offers to define a customized model, in which it allows one to customize the layers according to their requirements Keras Custom Model. Though vgg16 model has an argument to define the pooling type, it is only global (i.e., applicable only to the output of last convolution block of the model) Keras Vgg16.

One can define the custom model as shown below,

import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import backend as K
from keras.engine import training
from keras import layers
from keras.layers import Dense
from keras.applications import imagenet_utils
from keras.utils import layer_utils

def MyVgg16Model(
                 include_top=True,
                 weights="imagenet",
                 input_tensor=None,
                 input_shape=None,
                 pooling=None,
                 classes=1000,
                 classifier_activation="softmax"):

                
    input_shape = imagenet_utils.obtain_input_shape(
                                            input_shape,
                                            default_size=224,
                                            min_size=32,
                                            data_format=K.image_data_format(),
                                            require_flatten=include_top,
                                            weights=weights,
                                        )
                                        
    if input_tensor is None:
        img_input = layers.Input(shape=input_shape)
    else:
        if not K.is_keras_tensor(input_tensor):
            img_input = layers.Input(tensor=input_tensor, shape=input_shape)
        else:
            img_input = input_tensor

    # Block 1
    x = layers.Conv2D(
        64, (3, 3), activation="relu", padding="same", name="block1_conv1"
    )(img_input)
    x = layers.Conv2D(
        64, (3, 3), activation="relu", padding="same", name="block1_conv2"
    )(x)
    x = layers.AveragePooling2D((2, 2), strides=(2, 2), name="block1_pool")(x)

    # Block 2
    x = layers.Conv2D(
        128, (3, 3), activation="relu", padding="same", name="block2_conv1"
    )(x)
    x = layers.Conv2D(
        128, (3, 3), activation="relu", padding="same", name="block2_conv2"
    )(x)
    x = layers.AveragePooling2D((2, 2), strides=(2, 2), name="block2_pool")(x)

    # Block 3
    x = layers.Conv2D(
        256, (3, 3), activation="relu", padding="same", name="block3_conv1"
    )(x)
    x = layers.Conv2D(
        256, (3, 3), activation="relu", padding="same", name="block3_conv2"
    )(x)
    x = layers.Conv2D(
        256, (3, 3), activation="relu", padding="same", name="block3_conv3"
    )(x)
    x = layers.AveragePooling2D((2, 2), strides=(2, 2), name="block3_pool")(x)

    # Block 4
    x = layers.Conv2D(
        512, (3, 3), activation="relu", padding="same", name="block4_conv1"
    )(x)
    x = layers.Conv2D(
        512, (3, 3), activation="relu", padding="same", name="block4_conv2"
    )(x)
    x = layers.Conv2D(
        512, (3, 3), activation="relu", padding="same", name="block4_conv3"
    )(x)
    x = layers.AveragePooling2D((2, 2), strides=(2, 2), name="block4_pool")(x)

    # Block 5
    x = layers.Conv2D(
        512, (3, 3), activation="relu", padding="same", name="block5_conv1"
    )(x)
    x = layers.Conv2D(
        512, (3, 3), activation="relu", padding="same", name="block5_conv2"
    )(x)
    x = layers.Conv2D(
        512, (3, 3), activation="relu", padding="same", name="block5_conv3"
    )(x)
    x = layers.AveragePooling2D((2, 2), strides=(2, 2), name="block5_pool")(x)

    if include_top:
        # Classification block
        x = layers.Flatten(name="flatten")(x)
        x = Dense(4096, activation="relu", name="fc1")(x)
        x = Dense(4096, activation="relu", name="fc2")(x)

        imagenet_utils.validate_activation(classifier_activation, weights)
        x = layers.Dense(
            classes, activation=classifier_activation, name="predictions"
        )(x)
    else:
        if pooling == "avg":
            x = layers.GlobalAveragePooling2D()(x)
        elif pooling == "max":
            x = layers.GlobalMaxPooling2D()(x)

    if input_tensor is not None:
        inputs = layer_utils.get_source_inputs(input_tensor)
    else:
        inputs = img_input
    # Create model.
    model = training.Model(inputs, x, name="vgg16")

    return model

model_input = (224,224,3)
model = MyVgg16Model(include_top=False,
                           weights='imagenet',
                           input_shape=model_input)
model.summary()

In the model summary, you can see the changes as shown below,

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0

 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792

 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928

 block1_pool (AveragePooling  (None, 112, 112, 64)     0
 2D)

 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856

 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584

 block2_pool (AveragePooling  (None, 56, 56, 128)      0
 2D)

 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168

 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080

 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080

 block3_pool (AveragePooling  (None, 28, 28, 256)      0
 2D)

 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160

 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808

 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808

 block4_pool (AveragePooling  (None, 14, 14, 512)      0
 2D)

 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   

 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808

 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808

 block5_pool (AveragePooling  (None, 7, 7, 512)        0
 2D)

=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

This is excellent! But, for bigger models with several hundreds of layers, like say EfficientNet-B7, we may not be able to define the model architecture within the newly created function. How do we replace the layers in this regard? — shiva, Jan 22 '23 at 16:40
Keras is an elegant way for providing high level API for deep learning applications. For each layer, there is a code written in backend (mathematical formulations, programming concepts, etc.,) to serve the purpose. Even for Efficient Net they defined a function for the architecture, which is then being extended to B7 (https://github.com/keras-team/keras/blob/master/keras/applications/efficientnet.py). There are many more complicated architectures written using keras model class (https://keras.io/api/models/model/) . After all, each contribution matters. — Ipvikukiepki-KQS, Jan 22 '23 at 19:30

Replacing a Keras layer in a pretrained model with another layer

2 Answers2