Freeze layers with the multi_gpu_model in Keras

Question

I'm trying to fine-tune a modified InceptionV3 model in Keras.

I follow the example "Fine-tune InceptionV3 on a new set of classes" on this page.

So I first trained the top dense layers that were added to the InceptionV3 base model with the following code:

model = Model(inputs=base_model.input, outputs=predictions)

for layer in base_model.layers:
    layer.trainable = False

parallel_model = multi_gpu_model(model, gpus=2)

parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)

After that, I try to fine-tune the top 2 inception blocks from InceptionV3. And according to the example, what I should do is:

for layer in model.layers[:249]:
   layer.trainable = False
for layer in model.layers[249:]:
   layer.trainable = True

model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')

model.fit_generator(...)

But I'm using the multi_gpu_model, so I don't know how to freeze the first 249 layers.

I mean, if I freeze the layers in the no-gpu model (like the example), and use parallel_model = multi_gpu_model(model, gpus=2) to freeze the layers in the parallel_model, then the weights in the top dense layers that were just trained and contained in the parallel_model will be overwritten, right?

On the other hand, I tried to directly use for layer in parallel_model.layers[:249]: layer.trainable = False, but when I checked the layers in the parallel_model, it showed:

for i, layer in enumerate(parallel_model.layers):
   print(i, layer.name)

(0, 'input_1')
(1, 'lambda_1')
(2, 'lambda_2')
(3, 'model_1')
(4, 'dense_3')

So what are the 'lambda_1', 'lambda_2' and 'model_1' layers and why it only shows 5 layers in the parallel_model?

More importantly, how to freeze the layers in the parallel_model?

score 2 · Answer 1 · answered Feb 28 '18 at 20:12

This example is a little complicated since you're nesting a base model

base_model = InceptionV3(weights='imagenet', include_top=False)

into a model that adds your own dense layer,

model = Model(inputs=base_model.input, outputs=predictions)

and then calling multi_gpu_model which nests the model again when it splits the model once for each GPU using lambda and then concatenates the outputs back together in order to distribute the model over multiple gpus.

parallel_model = multi_gpu_model(model, gpus=2)

In this situation remember two things: change the trainability in layers in your base_model and load the non-parallel models onto your cpu for best performance.

Here is the full fine tune example, just update the train_data_dir to point to your own data location.

import tensorflow as tf
from keras import Model
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.layers import Dense, GlobalAveragePooling2D
from keras.optimizers import SGD
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import multi_gpu_model

train_data_dir = '/home/ubuntu/work/data/train'
batch_size_per_gpu = 32
nb_classes = 3
my_gpus = 2
target_size = (224, 224)
num_epochs_to_fit_dense_layer = 2
num_epochs_to_fit_last_two_blocks = 3

batch_size = batch_size_per_gpu * my_gpus
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
train_iterator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=target_size,
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=True)

# Check to make sure our model will match our data
assert nb_classes == train_iterator.num_classes

# Create base and template models on cpu
with tf.device('/cpu:0'):
    base_model = InceptionV3(weights='imagenet', include_top=False)
    for layer in base_model.layers:
        layer.trainable = False

    # Add prediction layer to base pre-trained model
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(1024, activation='relu')(x)
    predictions = Dense(nb_classes, activation='softmax')(x)

    template_model = Model(inputs=base_model.input, outputs=predictions)

    # If you need to load weights from previous training, do so here:
    # template_model.load_weights('template_model.h5', by_name=True)

# Create parallel model on GPUs
parallel_model = multi_gpu_model(template_model, gpus=2)
parallel_model.compile(optimizer='adam', loss='categorical_crossentropy')

# Train parallel model.
history = parallel_model.fit_generator(
    train_iterator,
    steps_per_epoch=train_iterator.n // batch_size,
    epochs=num_epochs_to_fit_dense_layer)

# Unfreeze some layers in our model
for layer in base_model.layers[:249]:
   layer.trainable = False
for layer in base_model.layers[249:]:
   layer.trainable = True

# Train parallel_model with more trainable layers
parallel_model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')
history2 = parallel_model.fit_generator(
    train_iterator,
    steps_per_epoch=train_iterator.n // batch_size,
    epochs=num_epochs_to_fit_last_two_blocks)

# Save model via the template model which shares the same weights as the parallel model.
template_model.save('template_model.h5')

Freeze layers with the multi_gpu_model in Keras

1 Answers1