0

I am doing emotion recognition with convolutional neural networks in MatConvNet. I have one main big dataset(A with 40.000 pics) and two harder, smaller datasets(B,C with 5.000 pics) with the same classes. When I run my network on the dataset A with random weight initialization, I get 70% accuracy.

So, I wanted to increase the performance by initializing with weights pretrained on datasets B and C on the same network architecture. I only take three initial layers(conv,relu,pool) from pretrained network when finetuning my network on dataset A. However, I get lower result than with random weights. I also tried taking all the layers, six first layers and one first layer.

Am I understanding and implementing it right? Instead of random weights in first three layers(actually just in the first one - conv), I use the ones from pretrained network. Now I am not sure if I understand the concept well.

I use following code for fine tuning:

net = load('net-epoch-100.mat');
trainOpts.learningRate = [0.004*ones(1,25), 0.002*ones(1,25), 
0.001*ones(1,25), 0.0005*ones(1,25)];  %I set much higher training rate
                                       %for pretraining on datasets B and C
net.layers=net.layers(1:end-13);       %only taking first three layers from pretrained net
... the rest of the layers
Nicole
  • 31
  • 6

1 Answers1

0

"I only take three initial layers(conv,relu,pool) from pretrained network when finetuning my network on dataset A."

Since relu and pool are not trainable, you essentially only used one layer from pretrained network. The first conv layer just does some edge detection and does not capture any high-level visual concepts. The best practice for transfer learning is using ImageNet pretrained features from high layers. You can first fine-tune it on your large dataset and then fine-tune it on your small dataset.

DataHungry
  • 351
  • 2
  • 9
  • Thanks for the answer @DataHungry. I also tried to do the same with 3 conv layers but the result were also worse. I will try with ImageNet as you suggest. Do you know how can I implement "pretrained features from high levels" in Matlab? I load the pretrained network, take a few first layers from it and then I follow with my own layers? – Nicole May 24 '17 at 08:30
  • @Nicole You should keep most of the layers of the pretrained network, not only a few first ones. I suggest you: (1) load the pretrained network. (2) remove the last several layers. (3) attach several layers to adapt to your problem – DataHungry May 26 '17 at 23:22
  • now I got the accuracy almost 10% higher! Thank you! – Nicole May 29 '17 at 16:33