Float Multi-label Regression in Caffe - loss results

Question

I have trained NN for Regression problem. my data type is HDF5_DATA that made of .jpg images (3X256X256) and float-label array (3 labels). Data-Set create script:

import h5py, os
import caffe
import numpy as np

SIZE = 256 # images size
with open( '/home/path/trainingTintText.txt', 'r' ) as T :
    lines = T.readlines()

X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
labels = np.zeros( (len(lines),3), dtype='f4' )

for i,l in enumerate(lines):
    sp = l.split(' ')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) )
    transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img*255
    print X[i]
    labels[i,0] = float(sp[1])
    labels[i,1] = float(sp[2])
    labels[i,2] = float(sp[3])

with h5py.File('/home/path/train.h5','w') as H:
    H.create_dataset('data', data=X)
    H.create_dataset('label', data=labels)

with open('/home/path/train_h5_list.txt','w') as L:
    L.write( '/home/path/train.h5' )

this is (not fullish) architecture:

name: "NN"

layers {
  name: "NNd"
  top: "data"
  top: "label"
  type: HDF5_DATA
  hdf5_data_param {
   source: "/home/path/train_h5_list.txt"
   batch_size: 64
  }
    include: { phase: TRAIN }

}

layers {
  name: "data"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "/home/path/train_h5_list.txt"
    batch_size: 100

  }
  include: { phase: TEST }
}

layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 32
    kernel_size: 11
    stride: 2

    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}


layers {
  name: "ip2"
  type: INNER_PRODUCT
  bottom: "ip1"
  top: "ip2"
  inner_product_param {
    num_output: 3

    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}

layers {
  name: "relu22"
  type: RELU
  bottom: "ip2"
  top: "ip2"
}

layers {
  name: "loss"
  type: EUCLIDEAN_LOSS
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

when I train the NN I got very high loss values:

I1117 08:15:57.707001  2767 solver.cpp:337] Iteration 0, Testing net (#0)
I1117 08:15:57.707033  2767 net.cpp:684] Ignoring source layer fkp
I1117 08:15:59.111842  2767 solver.cpp:404]     Test net output #0: loss = 256.672 (* 1 = 256.672 loss)
I1117 08:15:59.275205  2767 solver.cpp:228] Iteration 0, loss = 278.909
I1117 08:15:59.275255  2767 solver.cpp:244]     Train net output #0: loss = 278.909 (* 1 = 278.909 loss)
I1117 08:15:59.275276  2767 sgd_solver.cpp:106] Iteration 0, lr = 0.01
I1117 08:16:57.115145  2767 solver.cpp:337] Iteration 100, Testing net (#0)
I1117 08:16:57.115486  2767 net.cpp:684] Ignoring source layer fkp
I1117 08:16:58.884704  2767 solver.cpp:404]     Test net output #0: loss = 238.257 (* 1 = 238.257 loss)
I1117 08:16:59.026926  2767 solver.cpp:228] Iteration 100, loss = 191.836
I1117 08:16:59.026971  2767 solver.cpp:244]     Train net output #0: loss = 191.836 (* 1 = 191.836 loss)
I1117 08:16:59.026993  2767 sgd_solver.cpp:106] Iteration 100, lr = 0.01
I1117 08:17:56.890614  2767 solver.cpp:337] Iteration 200, Testing net (#0)
I1117 08:17:56.890880  2767 net.cpp:684] Ignoring source layer fkp
I1117 08:17:58.665057  2767 solver.cpp:404]     Test net output #0: loss = 208.236 (* 1 = 208.236 loss)
I1117 08:17:58.809150  2767 solver.cpp:228] Iteration 200, loss = 136.422
I1117 08:17:58.809248  2767 solver.cpp:244]     Train net output #0: loss = 136.422 (* 1 = 136.422 loss)

when I divide the images and the label arrays by 255 I got very low loss results (neat to 0). what is the reason for those loss results? am I doing something wrong? thanks

Cross-posted: http://stackoverflow.com/q/40280068/781723, http://cs.stackexchange.com/q/65185/755. Please [do not post the same question on multiple sites](http://meta.stackexchange.com/q/64068). Each community should have an honest shot at answering without anybody's time being wasted. — D.W., Oct 27 '16 at 10:40
@D.W. I edited my question, can you help me please with my issue? thanks — Z.Kal, Nov 17 '16 at 10:12

D.W. · Answer 1 · 2016-11-17T21:37:29.817

With the Euclidean loss, this is only to be expected. The Euclidean loss should be smaller by a factor of 256 if you divide all of the labels by 256 and re-train. It doesn't mean that dividing the labels by 256 makes the network become any better at predicting the labels; you've just changed the "scale" (the "units").

In particular, the Euclidean loss is (roughly) L = sqrt((x₁ -y₁)² + (x₂ -y₂)²), where x is the correct answer and y is the output from the neural network. Suppose you divide every x by 256, then re-train. The neural network will learn to divide its output y by 256. How will this affect the Euclidean loss L? Well, if you work through the math, you'll find that L shrinks by a factor of 256.

It'd be like the difference between trying to predict a distance in feet, vs a distance in yards. The latter would involve dividing by 3. Conceptually, the overall accuracy of the network would remain the same; but the Euclidean loss would be divided by a factor of three, because you've changed the units from yards to meters. An average error of 0.1 feet would correspond to an average error of 0.0333 yards; but conceptually yield the "same" accuracy, even though 0.0333 looks like a smaller number than 0.1.

Dividing the images by 256 should be irrelevant. It's dividing the labels by 256 that caused the reduction in the loss function.

thanks for answering, but I did not understand your solution. I used `EUCLIDEAN_LOSS` and used Images and labels as they are, without divide them by 256. — Z.Kal, Nov 17 '16 at 21:11
@Z.Kal, see updated answer. I don't know what your comment means, as it seems to contradict the last few sentences of your question -- your question says you divided by 256, your comment says you didn't, so which is real situation? I'm responding to the last few sentences of your question. If your question does not correspond to what you actually did, then you need to edit the question. — D.W., Nov 17 '16 at 21:38
Got it! in your opinion, what should I do to improve the NN accuracy and reduce the loss? — Z.Kal, Nov 20 '16 at 07:19

Float Multi-label Regression in Caffe - loss results

1 Answers1

Linked