Pytorch's nn.BCEWithLogitsLoss() behaves totaly differently than nn.BCELoss()

Question

i'm totally new to pytorch. I was taking an e-course and was experimenting with pytorch. So i came across the two loss functions(The hypothesis for using these two losses is numerical stability with logits):

nn.BCEWithLogitsLoss()

and

nn.BCELoss()

For appropriate adjustments to the code and these two loss functions, I had quite different accuracy curves! For example with nn.BCELoss() as the below code snippet:

model = nn.Sequential(
nn.Linear(D, 1),
nn.Sigmoid()
)

criterion = nn.BCELoss()

Accuracy plot was: enter image description here

And for nn.BCEWithLogitsLoss(), as below:

model = nn.Linear(D, 1)
criterion = nn.BCEWithLogitsLoss()

Accuracy plot was:enter image description here

The rest of the code is the same for both examples. (Note that, loss curves were similar and decent) The leaning curves for both snippets were something like this: enter image description here I couldn't figure out, what is causing this problem(if there is a bug in my code or something wrong with my pytorch. Thank you for your time, and help in advance.

`BCEWithLogitsLoss` "combines a Sigmoid layer and the BCELoss in one single class." That is, you should not have the sigmoid activation before the `BCEWithLogitsLoss` as it's going to add the sigmoid for you. Since you have the sigmoid it's getting applied twice when you compute loss but only once when you compute accuracy. — Oliver Dain, Apr 10 '23 at 18:26

TheEngineerProgrammer · Answer 1 · 2023-04-10T19:03:15.213

nn.BCELoss() expects your output to be probabilities, that is with the sigmoid activation.
nn.BCEWithLogitsLoss() expects your output to be logits, that is without the sigmoid activation.

I think maybe you calculated something wrong (like accuracy). Here I give you a simple example based on your code:

With probabilities:

dummy_x = torch.randn(1000,1)
dummy_y = (dummy_x > 0).type(torch.float)

model1 = nn.Sequential(
    nn.Linear(1, 1),
    nn.Sigmoid()
)
criterion1 = nn.BCELoss()
optimizer = torch.optim.Adam(model1.parameters(), 0.001)

def binary_accuracy(preds, y, logits=False):
    if logits:
        rounded_preds = torch.round(torch.sigmoid(preds))
    else:
        rounded_preds = torch.round(preds)
    correct = (rounded_preds == y).float()
    accuracy = correct.sum() / len(y)
    return accuracy

for e in range(2000):
    y_hat = model1(dummy_x)
    loss = criterion1(y_hat, dummy_y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if e != 0 and e % 100==0:
        print(f"Epoch: {e}, Loss: {loss:.4f}")
        print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y)}")

#Result:
Epoch: 100, Loss: 0.5840
Epoch: 100, Acc: 0.5839999914169312
Epoch: 200, Loss: 0.5423
Epoch: 200, Acc: 0.6499999761581421
...
Epoch: 1800, Loss: 0.2862
Epoch: 1800, Acc: 0.9950000047683716
Epoch: 1900, Loss: 0.2793
Epoch: 1900, Acc: 0.9929999709129333

Now with logits

model2 = nn.Linear(1, 1)
criterion2 = nn.BCEWithLogitsLoss()
optimizer2 = torch.optim.Adam(model2.parameters(), 0.001)
for e in range(2000):
    y_hat = model2(dummy_x)
    loss = criterion2(y_hat, dummy_y)
    optimizer2.zero_grad()
    loss.backward()
    optimizer2.step()

    if e != 0 and e % 100==0:
        print(f"Epoch: {e}, Loss: {loss:.4f}")
        print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y, logits=True)}")

#Results: 
Epoch: 100, Loss: 1.1042
Epoch: 100, Acc: 0.007000000216066837
Epoch: 200, Loss: 1.0484
Epoch: 200, Acc: 0.01899999938905239
...
Epoch: 1800, Loss: 0.5019
Epoch: 1800, Acc: 0.9879999756813049
Epoch: 1900, Loss: 0.4844
Epoch: 1900, Acc: 0.9879999756813049

score 0 · Answer 2 · edited Jun 30 '23 at 03:03

You would need to modify the code according to the loss function (aka criterion) you are using. For BCEloss - Since you are using the sigmoid layer in your model: so the output are between 0 and 1.

For BCEWithLogitsLoss - Output is the logit. Logit can be negative or positive. Logit is z, where

z = w1*x1 + w2*x2 + ... wn*xn

So, for your predictions while using BCEWithLogitsLoss, you need to pass this output through a sigmoid layer (For this you can create a small function which returns

1/(1+np.exp(-np.dot(x,w)))

and then you should calculate the accuracy.

Hope this helps!!!

Pytorch's nn.BCEWithLogitsLoss() behaves totaly differently than nn.BCELoss()

2 Answers2