I am doing emotion recognition with convolutional neural networks in MatConvNet. I have one main big dataset(A with 40.000 pics) and two harder, smaller datasets(B,C with 5.000 pics) with the same classes. When I run my network on the dataset A with random weight initialization, I get 70% accuracy.
So, I wanted to increase the performance by initializing with weights pretrained on datasets B and C on the same network architecture. I only take three initial layers(conv,relu,pool) from pretrained network when finetuning my network on dataset A. However, I get lower result than with random weights. I also tried taking all the layers, six first layers and one first layer.
Am I understanding and implementing it right? Instead of random weights in first three layers(actually just in the first one - conv), I use the ones from pretrained network. Now I am not sure if I understand the concept well.
I use following code for fine tuning:
net = load('net-epoch-100.mat');
trainOpts.learningRate = [0.004*ones(1,25), 0.002*ones(1,25),
0.001*ones(1,25), 0.0005*ones(1,25)]; %I set much higher training rate
%for pretraining on datasets B and C
net.layers=net.layers(1:end-13); %only taking first three layers from pretrained net
... the rest of the layers