1

I want to train my network using matlab and matconvnet-1.0-beta25. My problem is regression and I use pdist as loss function to get mse. The inputs data is 56*56*64*6000 and the targets data is 56*56*64*6000 and network architecture is as follows:

opts.networkType = 'simplenn' ;
opts = vl_argparse(opts, varargin) ;

lr = [.01 2] ;

% Define network CIFAR10-quick
net.layers = {} ;

% Block 1
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.01*randn(5,5,64,32, 'single'), zeros(1, 32, 'single')}}, ...
                           'learningRate', lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.05*randn(5,5,32,16, 'single'), zeros(1,16,'single')}}, ...
                           'learningRate', .1*lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.01*randn(5,5,16,8, 'single'), zeros(1, 8, 'single')}}, ...
                           'learningRate', lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.05*randn(5,5,8,16, 'single'), zeros(1,16,'single')}}, ...
                           'learningRate', .1*lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.01*randn(5,5,16,32, 'single'), zeros(1, 32, 'single')}}, ...
                           'learningRate', lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.05*randn(5,5,32,64, 'single'), zeros(1,64,'single')}}, ...
                           'learningRate', .1*lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
% Loss layer
net.layers{end+1} = struct('type', 'pdist') ;

% Meta parameters
net.meta.inputSize = [56 56 64] ;
net.meta.trainOpts.learningRate = [0.0005*ones(1,30) 0.0005*ones(1,10) 0.0005*ones(1,5)] ;
net.meta.trainOpts.weightDecay = 0.0001 ;
net.meta.trainOpts.batchSize = 100 ;
net.meta.trainOpts.numEpochs = numel(net.meta.trainOpts.learningRate) ;

% Fill in default values

net = vl_simplenn_tidy(net) ;

I change getSimpleNNBatch(imdb, batch) function in ncnn_train (the name of mine) as follows:

function [images, labels] = getSimpleNNBatch(imdb, batch)
    images = imdb.images.data(:,:,:,batch) ;
    labels = imdb.images.labels(:,:,:,batch) ;
    if rand > 0.5, images=fliplr(images) ; 
end

because my label is multi-dimensional. Also I change errorFunction in cnn_train from multiclasses to none:

opts.errorFunction = 'none' ;

and change the error variable from:

% accumulate errors
error = sum([error, [...
  sum(double(gather(res(end).x))) ;
  reshape(params.errorFunction(params, labels, res),[],1) ; ]],2) ;

to:

% accumulate errors
error = sum([error, [...
  mean(mean(mean(double(gather(res(end).x))))) ;
  reshape(params.errorFunction(params, labels, res),[],1) ; ]],2) ;

My first question is why the res(end).x third dimension in above command is one instead of 64? this is 56*56*1*100 (100 is the batch).

Have I made a mistake?

here is the results:

train: epoch 01:   2/ 40: 10.1 (27.0) Hz objective: 21360.722
train: epoch 01:   3/ 40: 13.0 (30.0) Hz objective: 67328685.873
...
train: epoch 01:  39/ 40: 29.7 (29.6) Hz objective: 5179175.587
train: epoch 01:  40/ 40: 29.8 (30.6) Hz objective: 5049697.440
val: epoch 01:   1/ 10: 87.3 (87.3) Hz objective: 49.512
val: epoch 01:   2/ 10: 88.9 (90.5) Hz objective: 50.012
...
val: epoch 01:   9/ 10: 88.2 (88.2) Hz objective: 49.936
val: epoch 01:  10/ 10: 88.1 (87.3) Hz objective: 49.962
train: epoch 02:   1/ 40: 30.2 (30.2) Hz objective: 49.650
train: epoch 02:   2/ 40: 30.3 (30.4) Hz objective: 49.704
...
train: epoch 02:  39/ 40: 30.2 (31.6) Hz objective: 49.739
train: epoch 02:  40/ 40: 30.3 (31.0) Hz objective: 49.722
val: epoch 02:   1/ 10: 91.8 (91.8) Hz objective: 49.687
val: epoch 02:   2/ 10: 92.0 (92.2) Hz objective: 49.831
...
val: epoch 02:   9/ 10: 92.0 (88.5) Hz objective: 49.931
val: epoch 02:  10/ 10: 91.9 (91.1) Hz objective: 49.962
train: epoch 03:   1/ 40: 31.7 (31.7) Hz objective: 49.014
train: epoch 03:   2/ 40: 31.2 (30.8) Hz objective: 49.237
...

here is my network schema image

1 Answers1

0

Two inputs of pdist have got nxmx64x100 size as below and as this mentioned, the output of pdist has got the same height and width, but depth equal to one. About the correctness of error definition, you should debug and check the size and definition accurately. enter image description here

Hossein Kashiani
  • 330
  • 1
  • 6
  • 18
  • thanks, I add an image too, can i change this line in `pdist` from: `y1 = sqrt(sum(d.*d,3)) ;` to: `y1 = sqrt(d.*d) ;` ? –  Jan 15 '18 at 17:07
  • So by this way you convert it to the 56x56x64x100 and the output dimension does not match with [this](http://www.vlfeat.org/matconvnet/mfiles/vl_nnpdist/). pay heed that if you want to reform the distance you should also consider backprop (not just forward passes) – Hossein Kashiani Jan 15 '18 at 17:56
  • but I can train and there is no error. So what is your suggestion? –  Jan 15 '18 at 18:20
  • That makes no sense, you modify the forward passes and don't modify the back prob. – Hossein Kashiani Jan 15 '18 at 18:27
  • my last layer is `56*56*64` and my target is `56*56*64` and are equals. So in backprop after loss layer computations, I have `56*56*64` too, so what is wrong in backprop? every thing seems ok (sorry, I'm confused) –  Jan 15 '18 at 18:35