1

Looking at the documentation for logloss in Sklearn and BCEloss in Pytorch, these should be the same, i.e. just the normal log loss with weights applied. However, they behave differently - both with and without weights applied. Can anyone explain it to me? I could not find the source code for BCEloss (which refers to binary_cross_entropy internally).

input = torch.randn((3, 1), requires_grad=True)
target = torch.ones((3, 1), requires_grad=False)
w = torch.randn((3, 1), requires_grad=False)

# ----- With weights
w = F.sigmoid(w)
criterion_test = nn.BCELoss(weight=w)
print(criterion_test(input=F.sigmoid(input), target=F.sigmoid(target)))
print(log_loss(y_true=target.detach().numpy(), 
               y_pred=F.sigmoid(input).detach().numpy(), sample_weight=w.detach().numpy().reshape(-1), labels=np.array([0.,1.])))
print("")
print("")
# ----- Without weights
criterion_test = nn.BCELoss()
print(criterion_test(input=F.sigmoid(input),target=F.sigmoid(target)))
print(log_loss(y_true=target.detach().numpy(), 
               y_pred=F.sigmoid(input).detach().numpy(), labels=np.array([0.,1.])))
sentence
  • 8,213
  • 4
  • 31
  • 40
Peter Alexander
  • 101
  • 3
  • 10

2 Answers2

1

Regarding the computation without weights, using BCEWithLogitsLoss you get the same result as for sklearn.metrics.log_loss:

import torch
import torch.nn as nn
from sklearn.metrics import log_loss
import numpy as np

input = torch.randn((3, 1), requires_grad=True)
target = torch.ones((3, 1), requires_grad=False)

# ----- Without weights
criterion = torch.nn.BCEWithLogitsLoss()
criterion(input, target)
print('{:.6f}'.format(criterion(input, target)))
print('{:.6f}'.format((log_loss(y_true=target.detach().numpy(),
                                y_pred=torch.sigmoid(input).detach().numpy(),
                                labels=np.array([0.,1.])))))

Note that:

This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

sentence
  • 8,213
  • 4
  • 31
  • 40
  • Ok, thanks. Did not know that was more numerical stable. However, in the meantime I found out the BCELoss at least does not 'normalize' the weights internally wheras log_loss from Sklearn does. – Peter Alexander May 01 '19 at 15:54
0

Actually, I found out. It turns out that BCELoss and log_loss behaves differently when the weights sum up to more than the dimension of the input array. Interesting.

Peter Alexander
  • 101
  • 3
  • 10