I was looking at the Tensorflow's basic neural network for beginners [1]. I am having trouble understanding the calculation of the entropy value and how its used. In the example a place holder is created to hold the correct labels:
y_ = tf.placeholder(tf.float32, [None, 10])
and the cross-entropy, sum y'.log(y), is calculated as follows:
reduct = -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])
cross_entropy = tf.reduce_mean( reduct )
Looking at the dimensions I assume we have (element wise multiplication):
y_ * log(y) = [batch x classes ] x [batch x classes ]
y_ * log(y) = [batch x classes ]
And a quick check confirms this:
y_ * tf.log(y)
<tf.Tensor 'mul_8:0' shape=(?, 10) dtype=float32>
Now here is what I don't understand. My understanding is that for cross-entropy we need to consider the distributions of y
(predicted) and y_
(oracle). So I assume that we first need to reduce_mean
of the y and the y_
by their columns (by class). I would then get 2 vectors of size:
y_ = [classes x 1 ]
y = [classes x 1 ]
Since y_ is the "correct" distribution, we then do a (notice that in the example the vectors are flipped):
log(y_) = [ classes x 1 ]
And now we do an element wise multiplication:
y x log(y_)
Which gives us a vector with the length of the classes. And finally we simply sum this vector to get a single value:
Hy(y_) = sum( y x log(y_) )
However, this does not seem to be the calculations that are being performed. Can anyone explain were my error is? Maybe point me to some page with a good explanation. In addition to this we are using one-hot encoding. So log(1) = 0 and log(0) = -infinity so this will cause errors in the calculations. I understand that the optimizer will calculate the derivatives, but isn't the cross-entropy still calculated?
TIA.
[1] https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html