Looking at the source code in sigmoid_cross_entropy_loss_layer.cpp, which is the source code for Cross-Entropy loss function in caffe, I noticed that the code for the actual error value is
for (int i = 0; i < count; ++i) {
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
}
which seems to be rather different from the CE loss function in the documentation for Caffe or C++ implementation I found here:
https://visualstudiomagazine.com/Articles/2014/04/01/Neural-Network-Cross-Entropy-Error.aspx?Page=2
or in fact the definition of CE loss function.
Is this some sort of approximation? I first thought it is Taylor series expansion of log(1−x), but it doesn't work like that at all.