I have trained a model successfully over 100000 samples, which performs well both in train set and test set. Then, I tried to fine-tune it over one particular sample (one of the 100000 samples) and use the trained weights as the initialization.
But the result is a little strange and I believe it is caused by the batch normalization layer. Specifically, my code can be listed as follows:
model = mymodel()
model.load_weights('./pre_trained.h5') #start from history
rate = model.evaluate(x, y)
print(rate)
checkpoint = tf.keras.callbacks.ModelCheckpoint('./trained.h5', monitor='loss',
verbose=0, save_best_only=True, mode='min',save_weights_only=True)
model.fit(x, y,validation_data=[x, y], epochs=5, verbose=2, callbacks=[checkpoint])
model.load_weights('./trained.h5') rate = model.evaluate(x, y) print(rate)
mymodel is a self-define function to generate my model, consists of Dense and Batch normalization. x,y is the input and label of one particular sample. I want to further optimize the loss of the sample. However, the results is strange as:
1/1 [==============================] - 0s 209ms/step
-6.087581634521484
Train on 1 samples, validate on 1 samples
Epoch 1/200
- 1s - loss: -2.7749e-01 - val_loss: -6.0876e+00
Epoch 2/200
- 0s - loss: -2.8791e-01 - val_loss: -6.0876e+00
Epoch 3/200
- 0s - loss: -3.0012e-01 - val_loss: -6.0876e+00
Epoch 4/200
- 0s - loss: -3.1325e-01 - val_loss: -6.0876e+00
As it shown, first the model.evaluate
works well as the loss result ( -6.087581634521484) is close to the performance of loaded trained model. But the loss over the train set (actually same as the validation set in model.fit()
) is strange. The val_loss is normal, similar to the results of model.evaluate in the first line. So I'm really puzzled that why still a large difference between the train loss and the inference loss (the train loss is worse), as the train sample and the validation sample is the same one, I think the result should also be the same, or at least very close.I suspect the problem is caused by the BN layer, due to the large difference between train and inference. However, I have already set the trainable = False
of the BN layer after loading the pre-trained weights and before the model.fit
, but the problem is not solved.
out = tf.keras.layers.BatchNormalization(trainable=False)(out)
I still doubt the BN layer, and wonder if set trainable=False
is enough to keep the parameters of BN same.
Can anyone give me some advise? Thanks a lot for your help in advance. Sorry for my English, but I tried my best to explain my problem.