0

Below is my last layer in training net:

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "final"
  bottom: "label"
  top: "loss"
  loss_param {
    ignore_label: 255
    normalization: VALID
  }
}

Note I adopt a softmax_loss layer. Since its calculation form is like: - log (probability), it's weird the loss can be negative, as shown below(iteration 80).

I0404 23:32:49.400624  6903 solver.cpp:228] Iteration 79, loss = 0.167006
I0404 23:32:49.400806  6903 solver.cpp:244]     Train net output #0: loss = 0.167008 (* 1 = 0.167008 loss)
I0404 23:32:49.400825  6903 sgd_solver.cpp:106] Iteration 79, lr = 0.0001
I0404 23:33:25.660655  6903 solver.cpp:228] Iteration 80, loss = -1.54972e-06
I0404 23:33:25.660845  6903 solver.cpp:244]     Train net output #0: loss = 0 (* 1 = 0 loss)
I0404 23:33:25.660862  6903 sgd_solver.cpp:106] Iteration 80, lr = 0.0001
I0404 23:34:00.451464  6903 solver.cpp:228] Iteration 81, loss = 1.89034
I0404 23:34:00.451661  6903 solver.cpp:244]     Train net output #0: loss = 1.89034 (* 1 = 1.89034 loss) 

Can anyone explain it for me? How can this happened? Thank you very much!

PS: The task I do here is semantic segmentation. There are 20 object classes plus background in total(So 21 classes). The label range from 0-21. The extra label 225 is ignored which can be find in SoftmaxWithLoss definition at the beginning of this post.

huangh12
  • 11
  • 3
  • Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. – Prune Apr 04 '17 at 18:48
  • 1
    That said, it's still hard to guess without seeing the values used, or enough output to see the variability in loss function. It's vaguely possible that this is actually a loss very close to 0, but pushed over the edge by round-off error. I suspect more that there's a computational error within the model that runs that probability value over the theoretical 1.0 boundary. – Prune Apr 04 '17 at 18:50

1 Answers1

0

Is caffe run on GPU or CPU ? Print out prob_data that you get after softmax operation:

// find the next line in your cpu or gpu Forward function
softmax_layer_->Forward(softmax_bottom_vec_, softmax_top_vec_);
// make sure you have data in cpu
const Dtype* prob_data = prob_.cpu_data();

for (int i = 0; i < prob_.count(); i++) {
    printf("%f ", prob_data[i]);
}
sbond
  • 168
  • 1
  • 8