3

I have a neural network, trained on MNIST, with categorical cross entropy as its loss function.

For theoretical purposes my output layer is ReLu. Therefore a lot of its outputs are 0.

Now I stumbled across the following question:

Why don't I get a lot of errors, since certainly there will be a lot of zeros in my output, which I will take the log of.

Here, for convenience, the formula for categorical cross entropy.

L = \sum_{i=1}^m \sum_j L_{i,j} \log y_{i,j}

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
snoozzz
  • 135
  • 1
  • 1
  • 7
  • Are you sure you use ReLu for your output layer? Maybe you are actually using the usual softmax function without realizing it. I suggest you post your code if you still need help. – mcb Jan 12 '18 at 12:25
  • I can't find the exact code anymore, but I am certain, that I used ReLU in the output layer. The whole thing is still a mystery to me. But obviously not an urgent one :) – snoozzz Jan 12 '18 at 14:20

2 Answers2

2

It's not documented in https://keras.io/losses/#categorical_crossentropy and it seems to depend on the backend, but I'm quite sure that they don't make log y, but rather log(y+ epsilon) where epsilon is a small constant to prevent log(0).

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
0

Keras clips the network output using a constant 1e-7 and adds this constant again to the clipped output before performing the logarithm operation as defined here.

  epsilon_ = _constant_to_tensor(epsilon(), output.dtype.base_dtype)
  output = clip_ops.clip_by_value(output, epsilon_, 1. - epsilon_)

  # Compute cross entropy from probabilities.
  bce = target * math_ops.log(output + epsilon())
  bce += (1 - target) * math_ops.log(1 - output + epsilon())
  return -bce

Why Keras adds epsilon again to the clipped output is a mystery to me.

CodeWarrior
  • 1,239
  • 1
  • 14
  • 19