Difference between logloss in sklearn and BCEloss in Pytorch?

Question

Looking at the documentation for logloss in Sklearn and BCEloss in Pytorch, these should be the same, i.e. just the normal log loss with weights applied. However, they behave differently - both with and without weights applied. Can anyone explain it to me? I could not find the source code for BCEloss (which refers to binary_cross_entropy internally).

input = torch.randn((3, 1), requires_grad=True)
target = torch.ones((3, 1), requires_grad=False)
w = torch.randn((3, 1), requires_grad=False)

# ----- With weights
w = F.sigmoid(w)
criterion_test = nn.BCELoss(weight=w)
print(criterion_test(input=F.sigmoid(input), target=F.sigmoid(target)))
print(log_loss(y_true=target.detach().numpy(), 
               y_pred=F.sigmoid(input).detach().numpy(), sample_weight=w.detach().numpy().reshape(-1), labels=np.array([0.,1.])))
print("")
print("")
# ----- Without weights
criterion_test = nn.BCELoss()
print(criterion_test(input=F.sigmoid(input),target=F.sigmoid(target)))
print(log_loss(y_true=target.detach().numpy(), 
               y_pred=F.sigmoid(input).detach().numpy(), labels=np.array([0.,1.])))

@BramVanroy It is right there. See criterion_test = nn.BCELoss(weight=w) and the same with logloss — Peter Alexander, May 01 '19 at 08:49

score 1 · Answer 1 · answered May 01 '19 at 14:13

Regarding the computation without weights, using BCEWithLogitsLoss you get the same result as for sklearn.metrics.log_loss:

import torch
import torch.nn as nn
from sklearn.metrics import log_loss
import numpy as np

input = torch.randn((3, 1), requires_grad=True)
target = torch.ones((3, 1), requires_grad=False)

# ----- Without weights
criterion = torch.nn.BCEWithLogitsLoss()
criterion(input, target)
print('{:.6f}'.format(criterion(input, target)))
print('{:.6f}'.format((log_loss(y_true=target.detach().numpy(),
                                y_pred=torch.sigmoid(input).detach().numpy(),
                                labels=np.array([0.,1.])))))

Note that:

This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

Ok, thanks. Did not know that was more numerical stable. However, in the meantime I found out the BCELoss at least does not 'normalize' the weights internally wheras log_loss from Sklearn does. — Peter Alexander, May 01 '19 at 15:54

score 0 · Accepted Answer · answered May 01 '19 at 11:41

0

Actually, I found out. It turns out that BCELoss and log_loss behaves differently when the weights sum up to more than the dimension of the input array. Interesting.

answered May 01 '19 at 11:41

Peter Alexander

101
3
10

Difference between logloss in sklearn and BCEloss in Pytorch?

2 Answers2