1

Looking at the source code in sigmoid_cross_entropy_loss_layer.cpp, which is the source code for Cross-Entropy loss function in caffe, I noticed that the code for the actual error value is

  for (int i = 0; i < count; ++i) {
    loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
        log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
  }

which seems to be rather different from the CE loss function in the documentation for Caffe or C++ implementation I found here:

https://visualstudiomagazine.com/Articles/2014/04/01/Neural-Network-Cross-Entropy-Error.aspx?Page=2

or in fact the definition of CE loss function.

Is this some sort of approximation? I first thought it is Taylor series expansion of log⁡(1−x), but it doesn't work like that at all.

Alex
  • 944
  • 4
  • 15
  • 28

1 Answers1

1

the loss implemented by this layer is not just cross entropy. The layer implements Sigmoid activation followed by cross entropy loss. This allows for a more numerically stable implementation of the loss.

see this thread for more information. Also this thread.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • ah ok then: I plugged in phat = 1/(1+exp(-x)) in the CE equation, and I'm only getting the result for x_n<0: -input_data[i]*target[i] + log(1+exp(input_data[i])) after x_n[i] cancels out. I don't see a problem with log (negative number) anywhere. How did you get the input_data[i] >= 0? – Alex Jun 12 '17 at 13:36
  • @Alex I'm sorry. I can't understand your comment. Can you please edit your question? – Shai Jun 12 '17 at 14:03
  • I'm sorry I don't think SO supports mathjax, so the formula is hard to read. I'm referring to lines 95 and 96 in sigmoid_cross_entropy_loss_layer.cpp, and variable input_data[i] >= 0. How did you get it? – Alex Jun 12 '17 at 14:44