0

I am trying to compare a fine-tuned VGGFace model which uses VGGFace weights with a completely retrained model. When I use the fine-tuned model, I get decent accuracy score. However, when I retrain the entire model by unfreezing the weights, the accuracy becomes close to random.

I was guessing whether it is due to small dataset used? I know VGGFace is trained on millions of samples and my dataset only has 1400 samples (700 for each class for a binary classification problem). But I just wanted to be sure if I joined the VGGFace model with my custom model correctly. Could it also be due to learning rate being too fast?

The model is set up using the following codes.

def Train_VGG_Model(train_layers=False):
    print('='*65);K.clear_session()
    vggface_model=VGGFace(model='vgg16')
    x=vggface_model.get_layer('fc7/relu').output
    x=Dense(512,name='custom_fc8')(x)
    x=Activation('relu',name='custom_fc8/relu')(x)
    x=Dense(64,name='custom_fc9')(x)
    x=Activation('relu',name='custom_fc9/relu')(x)
    x=Dense(1,name='custom_fc10')(x)
    out=Activation('sigmoid',name='custom_fc10/sigmoid')(x)
    custom_model=Model(vggface_model.input,out,
                       name='Custom VGGFace Model')
    for layer in custom_model.layers:
        if 'custom_' not in layer.name:
            layer.trainable=train_layers
        elif 'custom_' in layer.name:
            layer.trainable=True
        print('Layer name:',layer.name,
              '... Trainable:',layer.trainable)
    print('='*65);opt=optimizers.Adam(lr=1e-5)
    custom_model.compile(loss='binary_crossentropy',
                         metrics=['accuracy'],
                         optimizer=opt')
    custom_model.summary()
    return custom_model

callbacks=[EarlyStopping(monitor='val_loss',patience=1,mode='auto')]
model=Train_VGG_Model(train_layers=train_layers)
model.fit(X_train,y_train,batch_size=32,epochs=100,
callbacks=callbacks,validation_data=(X_valid,y_valid))

Outputs:

Layer name: input_1 ... Trainable: True
Layer name: conv1_1 ... Trainable: True
Layer name: conv1_2 ... Trainable: True
Layer name: pool1 ... Trainable: True
Layer name: conv2_1 ... Trainable: True
Layer name: conv2_2 ... Trainable: True
Layer name: pool2 ... Trainable: True
Layer name: conv3_1 ... Trainable: True
Layer name: conv3_2 ... Trainable: True
Layer name: conv3_3 ... Trainable: True
Layer name: pool3 ... Trainable: True
Layer name: conv4_1 ... Trainable: True
Layer name: conv4_2 ... Trainable: True
Layer name: conv4_3 ... Trainable: True
Layer name: pool4 ... Trainable: True
Layer name: conv5_1 ... Trainable: True
Layer name: conv5_2 ... Trainable: True
Layer name: conv5_3 ... Trainable: True
Layer name: pool5 ... Trainable: True
Layer name: flatten ... Trainable: True
Layer name: fc6 ... Trainable: True
Layer name: fc6/relu ... Trainable: True
Layer name: fc7 ... Trainable: True
Layer name: fc7/relu ... Trainable: True
Layer name: custom_fc8 ... Trainable: True
Layer name: custom_fc8/relu ... Trainable: True
Layer name: custom_fc9 ... Trainable: True
Layer name: custom_fc9/relu ... Trainable: True
Layer name: custom_fc10 ... Trainable: True
Layer name: custom_fc10/sigmoid ... Trainable: True
=================================================================
Model: "Custom VGGFace Model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
conv1_1 (Conv2D)             (None, 224, 224, 64)      1792      
_________________________________________________________________
conv1_2 (Conv2D)             (None, 224, 224, 64)      36928     
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 112, 112, 64)      0         
_________________________________________________________________
conv2_1 (Conv2D)             (None, 112, 112, 128)     73856     
_________________________________________________________________
conv2_2 (Conv2D)             (None, 112, 112, 128)     147584    
_________________________________________________________________
pool2 (MaxPooling2D)         (None, 56, 56, 128)       0         
_________________________________________________________________
conv3_1 (Conv2D)             (None, 56, 56, 256)       295168    
_________________________________________________________________
conv3_2 (Conv2D)             (None, 56, 56, 256)       590080    
_________________________________________________________________
conv3_3 (Conv2D)             (None, 56, 56, 256)       590080    
_________________________________________________________________
pool3 (MaxPooling2D)         (None, 28, 28, 256)       0         
_________________________________________________________________
conv4_1 (Conv2D)             (None, 28, 28, 512)       1180160   
_________________________________________________________________
conv4_2 (Conv2D)             (None, 28, 28, 512)       2359808   
_________________________________________________________________
conv4_3 (Conv2D)             (None, 28, 28, 512)       2359808   
_________________________________________________________________
pool4 (MaxPooling2D)         (None, 14, 14, 512)       0         
_________________________________________________________________
conv5_1 (Conv2D)             (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv5_2 (Conv2D)             (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv5_3 (Conv2D)             (None, 14, 14, 512)       2359808   
_________________________________________________________________
pool5 (MaxPooling2D)         (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc6 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc6/relu (Activation)        (None, 4096)              0         
_________________________________________________________________
fc7 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
fc7/relu (Activation)        (None, 4096)              0         
_________________________________________________________________
custom_fc8 (Dense)           (None, 512)               2097664   
_________________________________________________________________
custom_fc8/relu (Activation) (None, 512)               0         
_________________________________________________________________
custom_fc9 (Dense)           (None, 64)                32832     
_________________________________________________________________
custom_fc9/relu (Activation) (None, 64)                0         
_________________________________________________________________
custom_fc10 (Dense)          (None, 1)                 65        
_________________________________________________________________
custom_fc10/sigmoid (Activat (None, 1)                 0         
=================================================================
Total params: 136,391,105
Trainable params: 136,391,105
Non-trainable params: 0
_________________________________________________________________
Train on 784 samples, validate on 336 samples
Epoch 1/100
784/784 [==============================] - 235s 300ms/step - loss: 0.7987 - accuracy: 0.5051 - val_loss: 0.6932 - val_accuracy: 0.5149
Epoch 2/100
784/784 [==============================] - 233s 298ms/step - loss: 0.6935 - accuracy: 0.4605 - val_loss: 0.6932 - val_accuracy: 0.4792
Epoch 3/100
784/784 [==============================] - 236s 301ms/step - loss: 0.6932 - accuracy: 0.5089 - val_loss: 0.6932 - val_accuracy: 0.4792
280/280 [==============================] - 12s 45ms/step

Thanks in advance and excuse me if my question doesn't make sense. I'm very new to this.

Dawei Wang
  • 160
  • 12

1 Answers1

1

If you have already a good weight which was trained with a large enough dataset, it's always ideal to fine-tune/ train only last few layers and make the previous layers frozen.

For any conv NN, the initial layers work as feature extractors, a good pre-trained model has already learned the best features for a good enough dataset.

Once you try to re-train the entire model, you're throwing away everything. The model will try to shift towards the new dataset you have (probably it's smaller and doesn't have a good distribution as the original one). This can make the model perform poorly.

Another thing you can try if you really want to train the whole model, is for the initial layers, select a very small learning rate (1e-5 to 1e-6), and for the final layer choose something like (1e-3).

Zabir Al Nazi
  • 10,298
  • 4
  • 33
  • 60
  • Thanks for your suggestions! I actually tried retraining the entire VGG16 model from the ground up from Keras (not VGGFace) and the results were good, i.e. above chance level. VGG16 and VGGFace are supposedly based on the same VGG16 architecture so I don't understand why when I load the VGG16 model in Keras it's doinig well but here, when I load the VGG16 model from VGGFace is not. Could it be due to weights initialization? I.e. when I unfreeze the weights, it is not throwing away the original weights? Thanks! – Dawei Wang May 14 '20 at 23:14
  • So I trained again with smaller learning rate and it worked. I also tried on GPU instead of CPU not sure if it makes a difference thought. – Dawei Wang May 19 '20 at 00:38
  • Yes, that makes sense. GPU can make a difference but mostly by reducing training time, for some cases, it can lead to better performance too as GPU algorithms are written slightly differently than CPU ones. – Zabir Al Nazi May 19 '20 at 03:18