What is the difference between setting a Keras model trainable vs making each layer trainable

Question

I have a Keras Sequential model consisting of some Dense Layers. I set the trainable property of the whole model to False. But I see that the individual layers have still their trainable property set to True. Do I need to individually set the layers' trainable property also to False? Then what is the meaning of setting trainable property to False on the whole model?

Possible duplicate of [shouldn't model.trainable=False freeze weights under the model?](https://stackoverflow.com/questions/47204116/shouldnt-model-trainable-false-freeze-weights-under-the-model) — Dr. Snoopy, Jun 19 '19 at 22:19

score 9 · Answer 1 · answered Jun 20 '19 at 09:31

To be able to answer this you need to take a look at the source code of Keras, which you might be surprised after doing so because you would realize that:

The Sequential class is a subclass of Model class, and
the Model class is a subclass of Network class, and
the Network class is a subclass of Layer class!

As I said, this might be a bit surprising that a Keras model is derived from a Keras layer. But if you think further, you would find it reasonable since they have a lot of common functionalities (e.g. both get some inputs, do some computations on them, produce some output, and update their internal weights/parameters). One of their common attributes is trainable attribute. Now when you set the trainable property of a model as False it would skip the weight update step. In other words, it does not check the trainable attribute of its underlying layers; rather, first it checks its own trainable attribute (more precisely in Network class) and if it is False the updates are skipped. Therefore, that does not mean its underlying layers have their trainable attribute set to False as well. And there is a good reason for not doing that: a single instance of a layer could be used in multiple models. For example, consider the following two models which have a shared layer:

inp = Input(shape=...)

shared_layer = Dense(...)
sout = shared_layer(inp)

m1_out = Dense(...)(sout)
m2_out = Dense(...)(sout)

model1 = Model(inp, m1_out)
model2 = Model(inp, m2_out)

Now if we set model1.trainable = False, this would freezes the whole model1 (i.e. training model1 does not update the weights of its underlying layers including shared_layer); however, the shared_layer and the model2 are still trainable (i.e. training model2 would update the weights of all its layers including shared_layer). On the other hand, if we set model1.layers[1].trainable = False, then the shared_layer is freezed and therefore its weights would not be updated when training either model1 or model2. This way you could have much more control and flexibility, and therefore you can build more complex architectures (e.g. GANs).

So I came across a code snippet where the author first sets all the `layers.trainable = False` and then the `model.trainable=False` and then calls `model.compile`. Now if I want to reuse the layers whose weights have been frozen (as done above), then I can just call `layers.trainable=True` again and reuse the layers, right ? Btw thanks for the above answer, cleared my initial confusion. — Harshit Trehan, Apr 22 '21 at 20:18
@HarshitTrehan If by "reuse" you mean you want to make the weights of the layer trainable again, then you are right: you first set the `trainable` to `True` and then compile the model (i.e. call the `compile` method) to make this change effective (otherwise, the trainability status of the layer would NOT change without compiling the model). — today, Apr 23 '21 at 06:55

What is the difference between setting a Keras model trainable vs making each layer trainable

1 Answers1

Linked

Related