fine-tune with batch normalization in keras

Question

I have trained a model successfully over 100000 samples, which performs well both in train set and test set. Then, I tried to fine-tune it over one particular sample (one of the 100000 samples) and use the trained weights as the initialization.

But the result is a little strange and I believe it is caused by the batch normalization layer. Specifically, my code can be listed as follows:

model = mymodel()
model.load_weights('./pre_trained.h5') #start from history
rate = model.evaluate(x, y)
print(rate)
checkpoint = tf.keras.callbacks.ModelCheckpoint('./trained.h5', monitor='loss',
        verbose=0, save_best_only=True, mode='min',save_weights_only=True)
model.fit(x, y,validation_data=[x, y], epochs=5, verbose=2, callbacks=[checkpoint])

model.load_weights('./trained.h5') rate = model.evaluate(x, y) print(rate)

mymodel is a self-define function to generate my model, consists of Dense and Batch normalization. x,y is the input and label of one particular sample. I want to further optimize the loss of the sample. However, the results is strange as:

 1/1 [==============================] - 0s 209ms/step
-6.087581634521484
Train on 1 samples, validate on 1 samples
Epoch 1/200
 - 1s - loss: -2.7749e-01 - val_loss: -6.0876e+00
Epoch 2/200
 - 0s - loss: -2.8791e-01 - val_loss: -6.0876e+00
Epoch 3/200
 - 0s - loss: -3.0012e-01 - val_loss: -6.0876e+00
Epoch 4/200
 - 0s - loss: -3.1325e-01 - val_loss: -6.0876e+00

As it shown, first the model.evaluate works well as the loss result ( -6.087581634521484) is close to the performance of loaded trained model. But the loss over the train set (actually same as the validation set in model.fit()) is strange. The val_loss is normal, similar to the results of model.evaluate in the first line. So I'm really puzzled that why still a large difference between the train loss and the inference loss (the train loss is worse), as the train sample and the validation sample is the same one, I think the result should also be the same, or at least very close.I suspect the problem is caused by the BN layer, due to the large difference between train and inference. However, I have already set the trainable = False of the BN layer after loading the pre-trained weights and before the model.fit, but the problem is not solved.

out = tf.keras.layers.BatchNormalization(trainable=False)(out)

I still doubt the BN layer, and wonder if set trainable=False is enough to keep the parameters of BN same.

Can anyone give me some advise? Thanks a lot for your help in advance. Sorry for my English, but I tried my best to explain my problem.

In short, why loss and val_loss have such a large difference in model.fit() while the train set and the inference set share the same one sample? I think the result must be same or at least close, what's the reason? — LinTIna, Dec 24 '18 at 09:47

score 1 · Answer 1 · answered Dec 24 '18 at 10:21

A little awkward, I have found a strange way to solve the problem in another question Keras: Accuracy Drops While Finetuning Inception

Actually, I think it's not the enough answer, but when I add

 tf.keras.backend.set_learning_phase(1)

before the model.compile(). The result became much normal, although still exists some problem:

1/1 [==============================] - 0s 246ms/step
-6.087581634521484
Train on 1 samples, validate on 1 samples
Epoch 1/10
 - 1s - loss: -6.0876e+00 - val_loss: -6.0893e+00
Epoch 2/10
 - 0s - loss: -6.0893e+00 - val_loss: -6.0948e+00
Epoch 3/10
 - 0s - loss: -6.0948e+00 - val_loss: -6.0903e+00
Epoch 4/10
 - 0s - loss: -6.0903e+00 - val_loss: -6.0927e+00

It is amazing and what I want, but I still puzzled about the problem. First, why it works, what does tf.keras.backend.set_learning_phase(1) do? In addition, I set the layers.trainbale=True, and why the BN layer works normally in this case? Then, why the loss and the val_loss still has a very small difference? As the sample is the same, what cause the phenomenon? Finally, I find that whether I use tf.keras.backend.set_learning_phase(0) or tf.keras.backend.set_learning_phase(1), the result is similar and normal. Following is the result of tf.keras.backend.set_learning_phase(0):

1/1 [==============================] - 0s 242ms/step
-6.087581634521484
Train on 1 samples, validate on 1 samples
Epoch 1/10
 - 1s - loss: -6.0876e+00 - val_loss: -6.0775e+00
Epoch 2/10
 - 0s - loss: -6.0775e+00 - val_loss: -6.0925e+00
Epoch 3/10
 - 0s - loss: -6.0925e+00 - val_loss: -6.0908e+00
Epoch 4/10
 - 0s - loss: -6.0908e+00 - val_loss: -6.0883e+00

It is a little different from tf.keras.backend.set_learning_phase(1), which also wait for a proper explanation.

I'm new to deep learning and Keras, and I benefit a lot from Stack overflow. Both for my knowledge and my English.

Thanks for help in advance.

Jay Chen · Answer 2 · 2019-04-16T01:28:38.667

I had the similar finding in pytorch I would like to share. First of all, what is your keras version? Because after 2.1.3, set BN layer trainable=False will make BN behave exactly the same in inference mode, meaning that it will not normalize the input to 0 mean 1 variance(like in training mode), but to running mean and variance. If you set learning phase to 1, then BN essentially becomes instance norm, which ignores running mean and variance, just normalize to 0 mean and 1 variance, which might be your desired behavior.

Reference link of keras release note: https://github.com/keras-team/keras/releases/tag/2.1.3

API changes trainable attribute in BatchNormalization now disables the updates of the batch statistics (i.e. if trainable == False the layer will now run 100% in inference mode).

Links to external resources are encouraged, but please add context around the link so your fellow users will have some idea what it is and why it’s there. Always quote the most relevant part of an important link, in case the target site is unreachable or goes permanently offline. — baduker, Apr 15 '19 at 09:43

score 0 · Answer 3 · answered Jan 10 '19 at 00:53

0

I found a possibile explanation here: https://github.com/keras-team/keras/pull/9965 and here: https://github.com/keras-team/keras/issues/9214

answered Jan 10 '19 at 00:53

captainst

617
1
7
20

fine-tune with batch normalization in keras

3 Answers3

Linked