2

The caffe documentation on the softmax_loss_layer.hpp file seems to be targeted towards classification tasks and not semantic segmentation. However, I have seen this layer being used for the latter.

  1. What would be the dimensions of the input blobs and output blob in the case where you're classifying each pixel (semantic segmentation)?
  2. More importantly, how are the equations for calculating the loss applied to these blobs? Like, in what form are the matrices/blobs arranged and the eventual "loss value" that's output, what is the equation for that?

Thank you.

edits: I have referenced this page for understanding concepts of loss equation, just don't know how it's applied to the blobs, which axis, etc.: http://cs231n.github.io/linear-classify/

Here is the documentation from caffe:caffe softmax with loss description

simplename
  • 717
  • 7
  • 15

1 Answers1

2

Firstly, the input blobs should be of the form data NxKxHxW and label Nx1XHxW where each value in the label blob is an integer from [0-K]. I think there's an error in the caffe documentation where it doesn't consider the case for semantic segmentation, and I'm not sure what K = CHW means. The output blob is of the shape 1x1x1x1 which is the loss.

Secondly, the loss function is as follows, from softmax_loss_layer.cpp:

loss -= log(std::max(prob_data[i * dim + label_value * inner_num_ + j], Dtype(FLT_MIN)));

Breaking that line down (for semantic segmentation):

  1. std::max is just to ensure there's no invalid input like nan
  2. prob_data is the output from the softmax, as explained in the caffe tutorials, softmax loss layer can be decomposed into a softmax layer followed by multinomial logistic loss
  3. i * dim specifies the Nth image in your batch where the batch shape is like so NxKxHxW where K is the number of classes
  4. label_value * inner_num_ specifies the Kth image, because at this stage, each one of your classes have their own "image" of probabilities, so to speak
  5. Finally, j is the index for each pixel

Basically, you want prob_data[i * dim + label_value * inner_num_ + j] for each pixel to be as close to 1 as possible. This means that the negative log of that will be close to 0. Here the log is to base e. And then you do the stochastic gradient descent for that loss.

simplename
  • 717
  • 7
  • 15