0

I am trying to understand DNN with Matconvnet DagNN. I've a question based on the following last two layers of a net which uses euclidean loss for regression

net.addLayer('fc9', dagnn.Conv('size', [1 1 4096 1], 'hasBias', true, 'stride', [1,1], 'pad', [0 0 0 0]), {'drop8'}, {'prediction'},  {'conv10f'  'conv10b'});
 net.addLayer('l2_loss', dagnn.L2Loss(), {'prediction', 'label'}, {'objective'});

where the code for L2Loss is

function Y=vl_nnL2(X,c,dzdy)
 c=reshape(c,size(X));
 if nargin == 2 || (nargin == 3 && isempty(dzdy))
    diff_xc=(bsxfun(@minus, X,(c)));
    Y=diff_xc.^2;
 elseif nargin == 3 && ~isempty(dzdy)
    Y=(X-c).*dzdy;
 end
end

X is the output of fc9 layer, which is the feature vector of length 100 (batch size), and c is the labels.

  1. In the loss functions, how can the two be compared? X is an activation, a vector which is not probability.. I guess. and C is labels, integer values ranging from..0-10. So how can they be compared and subtracted, for instance? I dont know if there is any relationship between the two?
  2. Also, how does backpropagation compare fc9 output and labels for minimization?

*-----------new modified L2 regression function

function Y=vl_nnL2_(X,c,dzdy)
    c=reshape(c,size(X));
    [~,chat] = max(X,[],3) ;
    [~,lchat] = max(c,[],3) ; 
if nargin == 2 || (nargin == 3 && isempty(dzdy))
      t = (chat-lchat).^ 2 ;
     Y=sum(sum(t));
elseif nargin == 3 && ~isempty(dzdy)
  ch=squeeze(chat);
  aa1=repmat(ch',35,1);
  lch=squeeze(lchat);
  aa2=repmat(lch',35,1);
  t = (chat-lchat);
  Y = dzdy.*(aa1-aa2)*2;
Y = single(reshape(Y,size(X)));

end
end

enter image description here

h612
  • 544
  • 2
  • 11

1 Answers1

0

"if nargin == 2 || (nargin == 3 && isempty(dzdy))" checks if it's forward mode.

In the forward mode, you compute (prediction - label).^2:

diff_xc=(bsxfun(@minus, X,(c)));
Y=diff_xc.^2;

The derivative of L2 loss w.r.t. prediction is 2*(prediction - label). Thus we have

Y=(X-c).*dzdy;

in your code. Here the author of your code isn't rigorous enough to put the constant 2*. But in general it will work since it's just a constant scaling factor on your gradients. dzdy is the gradient from downstream layers. If this layer is the last one, dzdy=1, which manually provided by MatConvnet.

c must be of the same size as X since its' regression.

More comments coming. Let me know if you have other questions. I'm pretty familiar with MatConvNet.

DataHungry
  • 351
  • 2
  • 9
  • thank you. (1) I'm also curious about X- which is the feature vector? So the values of X are below 1, and the value of c, i.e. the labels is int [1:10]. How will this loss (X-c) work when both of them have no connection/have difference of class/nature? (2). How can I add an svr for regression in MatconvNet..just to be sure of regression? – h612 May 22 '17 at 12:23
  • 1
    @h612 (1) if c is int [1:10] then you are not having regression labels. they are classification labels. why are you doing L2 loss with classification labels? I guess you have to convert each integer into a one-hot vector. also the network has to use one-hot vector as output. (2) you can probably use LIBSVM's matlab API. MatConvNet doesn't have SVR. – DataHungry May 23 '17 at 00:55
  • This is the first time i saw my network converge.. I cant express my gratitude. I've estimated with hotvector on output, but the results are too high. e.e the img patch which must be 2 is 10,etc. That makes the overrall result too big/ Any thoughts? – h612 May 24 '17 at 15:29
  • @h612 what do you mean by "results are too high"? I don't follow. – DataHungry May 26 '17 at 23:15
  • Every patch has one lable assigned to it ranging from 1-10, However the label usually assigned is much greater than the groundtruth- if its suppose to be 2, its assigned 10, for instance. Is it because my training data is imbalanced? Ive only 10% training images with labels>5. Mostly the data has labels with range 1-5. – h612 May 27 '17 at 14:25
  • @h612. What do these labels represent? Are you already using one-hot vector? Imbalanced data indeed might cause this problem. But I don't understand why it assigns the minority label (e.g., since only 10% data has label > 5, but it assigns 10). – DataHungry May 27 '17 at 21:25
  • the euclidean loss function is not working- I used logistic regression and te network converged.. but not with Euclidean Distance based regression. Also, the 'fc9' layer should look like this? or is the following for regression too classLabels=max(unique(imdb_32.images.labels)); net.addLayer('fc2', dagnn.Conv('size', [1 1 40 1], 'hasBias', true, 'stride', [1,1], 'pad', [0 0 0 0]), {'drop3'}, {'prediction'}, {'conv4f' 'conv4b'}); – h612 May 28 '17 at 19:16
  • I tried with 'fc9' layer net.addLayer('fc9', dagnn.Conv('size', [1 1 4096 1],...... so in the L2Loss, c is 10x50 (max type of classes are 10, each column has one hot vector) and X is size [1,1,1,50]. Does each row in X represent a prob of correct prediction? Stuck in the same question of original post.. – h612 May 28 '17 at 19:24
  • 1
    @h612 I assume 50 is batch size. If you have 10 classes, why don't you do classification using cross entropy loss? Why do you want to do regression?If you insist to do regression, you need to convert c into 1x1x10x50, and X to 1x1x10x50 such that they have the same size, in which case your fc9 should have conv size [1 1 4096 10]. – DataHungry May 29 '17 at 03:52
  • So I've modified L2 loss as function .... with c and X size [1 1 10 50]. Kindly see me edited question. Also the result of network training is attached with the question. – h612 May 30 '17 at 17:31
  • I thought of making this change because c is a hot vector based matrix, values of 0 or 1, and X is a real number based matrix. When (X-c)^2 is used, the loss reduces- and the objective (training) does not converge below value near 0.424. – h612 May 30 '17 at 17:48
  • 1
    @h612 you shouldn't do this: [~,chat] = max(X,[],3) ; [~,lchat] = max(c,[],3) ; If c has the same shape as X, just do (X-c).*dzdy directly. – DataHungry May 30 '17 at 22:03
  • okay. So regression isn't really working..how can I do classification? classLabels=max(unique(imdb_32.images.labels)); net.addLayer('fc2', dagnn.Conv('size', [1 1 40 classLabels], 'hasBias', true, 'stride', [1,1], 'pad', [0 0 0 0]), {'drop3'}, {'prediction'}, {'conv4f' 'conv4b'}); net.addLayer('error', dagnn.Loss('loss', 'softmaxlog'), {'prediction','label'}, 'objective') ; – h612 May 31 '17 at 22:19
  • net.addLayer('fc2', dagnn.Conv('size', [1 1 40 1], 'hasBias', true, 'stride', [1,1], 'pad', [0 0 0 0]), {'drop3'}, {'prediction'}, {'conv4f' 'conv4b'}); -------------------------------------------- net.addLayer('objective', dagnn.Loss('loss', 'softmaxlog'), {'prediction', 'label'}, {'objective'}); – h612 May 31 '17 at 22:29
  • The loss layer softmaxlog wasnt expecting a hot vector label input. :/ So i've followed one other classification tutorial where-----------classLabels=max(unique(imdb_32.images.labels)); ---net.addLayer('fc2', dagnn.Conv('size', [1 1 40 classLabels], 'hasBias', true, 'stride', [1,1], 'pad', [0 0 0 0]), {'drop3'}, {'prediction'}, {'conv4f' 'conv4b'}); ----------------------net.addLayer('prob', dagnn.SoftMax(), {'prediction'}, {'prob'}, {});----- net.addLayer('objective', dagnn.Loss('loss', 'log'), {'prob', 'label'}, {'objective'}, {}); – h612 Jun 01 '17 at 08:39
  • The results are far from accurate – h612 Jun 01 '17 at 15:39
  • @h612 the softmaxlog layer expects integer label c, but one-hot format for the X – DataHungry Jun 01 '17 at 21:07
  • In that case, how can I find euclidean loss? X is hot vector and c is integer label. Y=(X-c)^2? – h612 Jul 09 '17 at 14:18