Implementing Binary Cross Entropy loss gives different answer than Tensorflow's

Question

I am implementing the Binary Cross-Entropy loss function with Raw python but it gives me a very different answer than Tensorflow. This is the answer I got from Tensorflow:-

import numpy as np
from tensorflow.keras.losses import BinaryCrossentropy

y_true = np.array([1., 1., 1.])
y_pred = np.array([1., 1., 0.])
bce = BinaryCrossentropy()
loss = bce(y_true, y_pred)
print(loss.numpy())

Output:

>>> 5.1416497230529785

From my Knowledge, the formula of Binary Cross entropy is this:

I implemented the same with raw python as follows:

def BinaryCrossEntropy(y_true, y_pred):
    m = y_true.shape[1]
    y_pred = np.clip(y_pred, 1e-7, 1 - 1e-7)
    # Calculating loss
    loss = -1/m * (np.dot(y_true.T, np.log(y_pred)) + np.dot((1 - y_true).T, np.log(1 - y_pred)))

    return loss

print(BinaryCrossEntropy(np.array([1, 1, 1]).reshape(-1, 1), np.array([1, 1, 0]).reshape(-1, 1)))

But from this function I get loss value to be:

>>> [[16.11809585]]

How can I get the right answer?

score 9 · Accepted Answer · answered May 20 '21 at 08:10

There's some issue with your implementation. Here is the correct one with numpy.

def BinaryCrossEntropy(y_true, y_pred):
    y_pred = np.clip(y_pred, 1e-7, 1 - 1e-7)
    term_0 = (1-y_true) * np.log(1-y_pred + 1e-7)
    term_1 = y_true * np.log(y_pred + 1e-7)
    return -np.mean(term_0+term_1, axis=0)

print(BinaryCrossEntropy(np.array([1, 1, 1]).reshape(-1, 1), 
                         np.array([1, 1, 0]).reshape(-1, 1)))
[5.14164949]

Note, during the tf. keras model training, it's better to use keras backend functionality. You can implement it, in the same way, using the keras backend utilities.

def BinaryCrossEntropy(y_true, y_pred): 
    y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
    term_0 = (1 - y_true) * K.log(1 - y_pred + K.epsilon())  
    term_1 = y_true * K.log(y_pred + K.epsilon())
    return -K.mean(term_0 + term_1, axis=0)

print(BinaryCrossEntropy(
    np.array([1., 1., 1.]).reshape(-1, 1), 
    np.array([1., 1., 0.]).reshape(-1, 1)
    ).numpy())
[5.14164949]

Yes, but if you see the question, the OP did. The answer was for him. — Innat, Mar 07 '22 at 04:20

score 0 · Answer 2 · answered May 20 '21 at 08:37

In the constructor of tf.keras.losses.BinaryCrossentropy(), you'll notice,

tf.keras.losses.BinaryCrossentropy(
    from_logits=False, label_smoothing=0, reduction=losses_utils.ReductionV2.AUTO,
    name='binary_crossentropy'
)

The default argument reduction will most probably have the value Reduction.SUM_OVER_BATCH_SIZE, as mentioned here. Assume that the shape of our model outputs is [ 1 , 3 ]. Meaning, our batch size is 1 and the output dims is 3 ( This does not imply that there are 3 classes ). We need to compute the mean over the 0th axis i.e. the batch dimension.

I'll make it clear with the code,

import tensorflow as tf
import numpy as np

y_true = np.array( [1., 1., 1.] ).reshape( 1 , 3 )
y_pred = np.array( [1., 1., 0.] ).reshape( 1 , 3 )

bce = tf.keras.losses.BinaryCrossentropy( from_logits=False , reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE )
loss = bce( y_true, y_pred )

print(loss.numpy())

The output is,

5.1416497230529785

The expression for Binary Crossentropy is the same as mentioned in the question. N refers to the batch size.

We now implement BCE on our own. First, we clip the outputs of our model, setting max to tf.keras.backend.epsilon() and min to 1 - tf.keras.backend.epsilon(). The value of tf.keras.backend.epsilon() is 1e-7.

y_pred = np.clip( y_pred , tf.keras.backend.epsilon() , 1 - tf.keras.backend.epsilon() )

Using the expression for BCE,

p1 = y_true * np.log( y_pred + tf.keras.backend.epsilon() )
p2 = ( 1 - y_true ) * np.log( 1 - y_pred + tf.keras.backend.epsilon() )

print( p1 )
print( p2 )

The output,

[[  0.           0.         -15.42494847]]
[[-0. -0.  0.]]

Notice that the shapes are still preserved. A np.dot will turn them into a array of two elements i.e. of shape [ 1 , 2 ] ( As in your implementation ).

Finally, we add them and compute their mean using np.mean() over the batch dimension,

o  = -np.mean( p1 + p2 )
print( o )

The output is,

5.141649490132791

You can check the problem in your implementation by printing the shape of each of the terms.

Implementing Binary Cross Entropy loss gives different answer than Tensorflow's

2 Answers2

Linked