Finetuning VGG-16 Slow training in Keras

Question

I'm trying to finetune the two last layers of a VGG model with LFW dataset , I've changed the softmax layer dimensions by removing the original one and adding my softmax layer with 19 outputs in my case since there are 19 classes that I'm trying to train. I also want to finetune the last fully connected layer in order to make a "custom feature extractor"

I'm setting layers that I want to be non-trainable like this:

for layer in model.layers:
    layer.trainable = False

Using a gpu it takes me like 1 hour per epoch to train with 19 classes and a minimum of 40 images per each class.

Since I don't have a lot of samples, it's kind of strange this training performance.

Anyone knows why is this happening?

Here the log:

Image shape:  (224, 224, 3)
Number of classes:  19
K.image_dim_ordering: th

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 3, 224, 224)   0                                            
____________________________________________________________________________________________________
conv1_1 (Convolution2D)          (None, 64, 224, 224)  1792        input_1[0][0]                    
____________________________________________________________________________________________________
conv1_2 (Convolution2D)          (None, 64, 224, 224)  36928       conv1_1[0][0]                    
____________________________________________________________________________________________________
pool1 (MaxPooling2D)             (None, 64, 112, 112)  0           conv1_2[0][0]                    
____________________________________________________________________________________________________
conv2_1 (Convolution2D)          (None, 128, 112, 112) 73856       pool1[0][0]                      
____________________________________________________________________________________________________
conv2_2 (Convolution2D)          (None, 128, 112, 112) 147584      conv2_1[0][0]                    
____________________________________________________________________________________________________
pool2 (MaxPooling2D)             (None, 128, 56, 56)   0           conv2_2[0][0]                    
____________________________________________________________________________________________________
conv3_1 (Convolution2D)          (None, 256, 56, 56)   295168      pool2[0][0]                      
____________________________________________________________________________________________________
conv3_2 (Convolution2D)          (None, 256, 56, 56)   590080      conv3_1[0][0]                    
____________________________________________________________________________________________________
conv3_3 (Convolution2D)          (None, 256, 56, 56)   590080      conv3_2[0][0]                    
____________________________________________________________________________________________________
pool3 (MaxPooling2D)             (None, 256, 28, 28)   0           conv3_3[0][0]                    
____________________________________________________________________________________________________
conv4_1 (Convolution2D)          (None, 512, 28, 28)   1180160     pool3[0][0]                      
____________________________________________________________________________________________________
conv4_2 (Convolution2D)          (None, 512, 28, 28)   2359808     conv4_1[0][0]                    
____________________________________________________________________________________________________
conv4_3 (Convolution2D)          (None, 512, 28, 28)   2359808     conv4_2[0][0]                    
____________________________________________________________________________________________________
pool4 (MaxPooling2D)             (None, 512, 14, 14)   0           conv4_3[0][0]                    
____________________________________________________________________________________________________
conv5_1 (Convolution2D)          (None, 512, 14, 14)   2359808     pool4[0][0]                      
____________________________________________________________________________________________________
conv5_2 (Convolution2D)          (None, 512, 14, 14)   2359808     conv5_1[0][0]                    
____________________________________________________________________________________________________
conv5_3 (Convolution2D)          (None, 512, 14, 14)   2359808     conv5_2[0][0]                    
____________________________________________________________________________________________________
pool5 (MaxPooling2D)             (None, 512, 7, 7)     0           conv5_3[0][0]                    
____________________________________________________________________________________________________
flatten (Flatten)                (None, 25088)         0           pool5[0][0]                      
____________________________________________________________________________________________________
fc6 (Dense)                      (None, 4096)          102764544   flatten[0][0]                    
____________________________________________________________________________________________________
fc7 (Dense)                      (None, 4096)          16781312    fc6[0][0]                        
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 4096)          16384       fc7[0][0]                        
____________________________________________________________________________________________________
fc8 (Dense)                      (None, 19)            77843       batchnormalization_1[0][0]       
====================================================================================================
Total params: 134,354,771
Trainable params: 16,867,347
Non-trainable params: 117,487,424
____________________________________________________________________________________________________
None
Train on 1120 samples, validate on 747 samples
Epoch 1/20
1120/1120 [==============================] - 7354s - loss: 2.9517 - acc: 0.0714 - val_loss: 2.9323 - val_acc: 0.2316
Epoch 2/20
1120/1120 [==============================] - 7356s - loss: 2.8053 - acc: 0.1732 - val_loss: 2.9187 - val_acc: 0.3614
Epoch 3/20
1120/1120 [==============================] - 7358s - loss: 2.6727 - acc: 0.2643 - val_loss: 2.9034 - val_acc: 0.3882
Epoch 4/20
1120/1120 [==============================] - 7361s - loss: 2.5565 - acc: 0.3071 - val_loss: 2.8861 - val_acc: 0.4016
Epoch 5/20
1120/1120 [==============================] - 7360s - loss: 2.4597 - acc: 0.3518 - val_loss: 2.8667 - val_acc: 0.4043
Epoch 6/20
1120/1120 [==============================] - 7363s - loss: 2.3827 - acc: 0.3714 - val_loss: 2.8448 - val_acc: 0.4163
Epoch 7/20
1120/1120 [==============================] - 7364s - loss: 2.3108 - acc: 0.4045 - val_loss: 2.8196 - val_acc: 0.4244
Epoch 8/20
1120/1120 [==============================] - 7377s - loss: 2.2463 - acc: 0.4268 - val_loss: 2.7905 - val_acc: 0.4324
Epoch 9/20
1120/1120 [==============================] - 7373s - loss: 2.1824 - acc: 0.4563 - val_loss: 2.7572 - val_acc: 0.4404
Epoch 10/20
1120/1120 [==============================] - 7373s - loss: 2.1313 - acc: 0.4732 - val_loss: 2.7190 - val_acc: 0.4471
Epoch 11/20
1120/1120 [==============================] - 7440s - loss: 2.0766 - acc: 0.5036 - val_loss: 2.6754 - val_acc: 0.4565
Epoch 12/20
1120/1120 [==============================] - 7414s - loss: 2.0323 - acc: 0.5170 - val_loss: 2.6263 - val_acc: 0.4565
Epoch 13/20
1120/1120 [==============================] - 7413s - loss: 1.9840 - acc: 0.5420 - val_loss: 2.5719 - val_acc: 0.4592
Epoch 14/20
1120/1120 [==============================] - 7414s - loss: 1.9467 - acc: 0.5464 - val_loss: 2.5130 - val_acc: 0.4592
Epoch 15/20
1120/1120 [==============================] - 7412s - loss: 1.9039 - acc: 0.5652 - val_loss: 2.4513 - val_acc: 0.4592
Epoch 16/20
1120/1120 [==============================] - 7413s - loss: 1.8716 - acc: 0.5723 - val_loss: 2.3906 - val_acc: 0.4578
Epoch 17/20
1120/1120 [==============================] - 7415s - loss: 1.8214 - acc: 0.5866 - val_loss: 2.3319 - val_acc: 0.4538
Epoch 18/20
1120/1120 [==============================] - 7416s - loss: 1.7860 - acc: 0.5982 - val_loss: 2.2789 - val_acc: 0.4538
Epoch 19/20
1120/1120 [==============================] - 7430s - loss: 1.7623 - acc: 0.5973 - val_loss: 2.2322 - val_acc: 0.4538
Epoch 20/20
1120/1120 [==============================] - 7856s - loss: 1.7222 - acc: 0.6170 - val_loss: 2.1913 - val_acc: 0.4538
Accuracy: 45.38%

The results are not good because I can't train it for more data because it takes too long. Any idea?

In addiction to "Marcin Możejko" - what about next: 1. remove top (dense) layers. 2. calculate output of network for your images (so you'll have 19*40 vectors). 3. train your new Dense part on this vectors. 4. combine this 2 networks (CNN and Dense) (anyway note that maybe it'll not give too good result). — Alexander Pozharskii, Apr 08 '17 at 00:58
i thought about it, what you are thinkinh about is to extract the features from images and then train sequential dense layers with this features? — Eric, Apr 08 '17 at 08:37
yep. Just extract features vector from images and train Dense layers. Maybe you'll get an acceptable result . — Alexander Pozharskii, Apr 08 '17 at 16:56
still slow, but it works. I'm in 80 % accuracy and 1.9 loss with 20 epochs in validation, so maybe i need more data for each class.... — Eric, Apr 12 '17 at 07:20

score 2 · Accepted Answer · answered Apr 07 '17 at 22:23

2

Please notice that you want to feed ~ 19 * 40 < 800 example in order to train 16,867,347 parameters. So this is basically 2e6 paramters per example. This simply cannot work properly. Try to delete all FCN layers (Dense layers at the top) and put smaller Dense with e.g. ~ 50 neurons each. In my opinion this should help you in improving accuracy and speeding up training.

answered Apr 07 '17 at 22:23

Marcin Możejko

39,542
10
109
120

Yeah i know, I've tried it but the performance is poor, like the validation accuracy freezes in 20% with a minimum of 20 images per class. So i'm planning to change my dataset because LFW have a lot of classes with only 1 image, so maybe if i take faceScrub that have more representations per class it will works better with the original VGG and obviouslly taking like 100 classes with a minimum of 200 images per class i.e.... What do you think? Thankss!! – Eric Apr 08 '17 at 08:28
What do you think about the computational time? – Eric Apr 08 '17 at 08:38
i changed the dataset (now i'm using faceScrub) and i've tried what you proposed, with 2 denses of 128 neurons each but it's still slow. I think this comes from convolutional layers because my images dimensions are 224*224. My results now are `val_loss: 2.4294 - val_acc: 0.8350` classifying 50 classes, 23 images per class, should i take more data? the loss function decrease very slow – Eric Apr 12 '17 at 11:21
What is your `batch_size`? – Marcin Możejko Apr 12 '17 at 11:25
And how long does the epoch last now? How many examples are fed during an epoch? – Marcin Możejko Apr 12 '17 at 11:33
here the log of last epoch: `Epoch 20/20 1200/1200 [==============================] - 6914s - loss: 2.2231 - acc: 0.8608 - val_loss: 2.4294 - val_acc: 0.8350 ` I'm taking 1200 examples – Eric Apr 12 '17 at 11:35
What GPU do you have? Are you sure it's working? Which backend do you use? – Marcin Możejko Apr 12 '17 at 11:36
I don't know the GPU model, i'm doing my experiments in a server and it's suposed to be working. I'm using tensorflow backend – Eric Apr 12 '17 at 11:41
1

i have installed the two versions of tensorflow (gpu and cpu), and for default keras take the cpu version, so i removed this version and it worked.... Thanks!!! :) – Eric Apr 12 '17 at 20:45
A small doubt: How can having smaller dense layers improve accuracy ?. – Amruth Lakkavaram Jan 23 '19 at 04:42

Finetuning VGG-16 Slow training in Keras

1 Answers1