How does Keras deal with log(0) for categorical cross entropy?

Question

I have a neural network, trained on MNIST, with categorical cross entropy as its loss function.

For theoretical purposes my output layer is ReLu. Therefore a lot of its outputs are 0.

Now I stumbled across the following question:

Why don't I get a lot of errors, since certainly there will be a lot of zeros in my output, which I will take the log of.

Here, for convenience, the formula for categorical cross entropy.

$L = \sum_{i=1}^m \sum_j L_{i,j} \log y_{i,j}$

Are you sure you use ReLu for your output layer? Maybe you are actually using the usual softmax function without realizing it. I suggest you post your code if you still need help. — mcb, Jan 12 '18 at 12:25
I can't find the exact code anymore, but I am certain, that I used ReLU in the output layer. The whole thing is still a mystery to me. But obviously not an urgent one :) — snoozzz, Jan 12 '18 at 14:20

score 2 · Accepted Answer · answered Apr 25 '18 at 09:33

2

It's not documented in https://keras.io/losses/#categorical_crossentropy and it seems to depend on the backend, but I'm quite sure that they don't make log y, but rather log(y+ epsilon) where epsilon is a small constant to prevent log(0).

answered Apr 25 '18 at 09:33

Martin Thoma

124,992
159
614
958

score 0 · Answer 2 · answered Nov 27 '21 at 10:37

Keras clips the network output using a constant 1e-7 and adds this constant again to the clipped output before performing the logarithm operation as defined here.

  epsilon_ = _constant_to_tensor(epsilon(), output.dtype.base_dtype)
  output = clip_ops.clip_by_value(output, epsilon_, 1. - epsilon_)

  # Compute cross entropy from probabilities.
  bce = target * math_ops.log(output + epsilon())
  bce += (1 - target) * math_ops.log(1 - output + epsilon())
  return -bce

Why Keras adds epsilon again to the clipped output is a mystery to me.

How does Keras deal with log(0) for categorical cross entropy?

2 Answers2