In the constructor of tf.keras.losses.BinaryCrossentropy()
, you'll notice,
tf.keras.losses.BinaryCrossentropy(
from_logits=False, label_smoothing=0, reduction=losses_utils.ReductionV2.AUTO,
name='binary_crossentropy'
)
The default argument reduction
will most probably have the value Reduction.SUM_OVER_BATCH_SIZE
, as mentioned here. Assume that the shape of our model outputs is [ 1 , 3 ]
. Meaning, our batch size is 1 and the output dims is 3 ( This does not imply that there are 3 classes ). We need to compute the mean over the 0th axis i.e. the batch dimension.
I'll make it clear with the code,
import tensorflow as tf
import numpy as np
y_true = np.array( [1., 1., 1.] ).reshape( 1 , 3 )
y_pred = np.array( [1., 1., 0.] ).reshape( 1 , 3 )
bce = tf.keras.losses.BinaryCrossentropy( from_logits=False , reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE )
loss = bce( y_true, y_pred )
print(loss.numpy())
The output is,
5.1416497230529785
The expression for Binary Crossentropy is the same as mentioned in the question. N refers to the batch size.
We now implement BCE on our own. First, we clip the outputs of our model, setting max
to tf.keras.backend.epsilon()
and min
to 1 - tf.keras.backend.epsilon()
. The value of tf.keras.backend.epsilon()
is 1e-7.
y_pred = np.clip( y_pred , tf.keras.backend.epsilon() , 1 - tf.keras.backend.epsilon() )
Using the expression for BCE,
p1 = y_true * np.log( y_pred + tf.keras.backend.epsilon() )
p2 = ( 1 - y_true ) * np.log( 1 - y_pred + tf.keras.backend.epsilon() )
print( p1 )
print( p2 )
The output,
[[ 0. 0. -15.42494847]]
[[-0. -0. 0.]]
Notice that the shapes are still preserved. A np.dot
will turn them into a array of two elements i.e. of shape [ 1 , 2 ]
( As in your implementation ).
Finally, we add them and compute their mean using np.mean()
over the batch dimension,
o = -np.mean( p1 + p2 )
print( o )
The output is,
5.141649490132791
You can check the problem in your implementation by printing the shape
of each of the terms.