Why am I getting a low error before I did any optimization?

Question

I am using a model training program I have built for a toy example and trying to use it on another example. The only difference is this model was used for regression, hence I was using MSE as the error criterion, and now it is used for binary classification, hence I am using BCEWithLogitsLoss.

The model is very simple:

class Model(nn.Module):
    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.fc1 = nn.Sequential( 
            nn.Linear(input_size, 8*input_size),
            nn.PReLU() #parametric relu - same as leaky relu except the slope is learned
        )
        self.fc2 = nn.Sequential( 
            nn.Linear(8*input_size, 80*input_size),
            nn.PReLU()
        )
        self.fc3 = nn.Sequential( 
            nn.Linear(80*input_size, 32*input_size),
            nn.PReLU()
        )
        self.fc4 = nn.Sequential( 
            nn.Linear(32*input_size, 4*input_size),
            nn.PReLU()
        )                   
        self.fc = nn.Sequential( 
            nn.Linear(4*input_size, output_size),
            nn.PReLU()
        )
                        

    def forward(self, x, dropout=dropout, batchnorm=batchnorm):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = self.fc(x)

        return x

And this is where I run it:

model = Model(input_size, output_size)

if (loss == 'MSE'):
    criterion = nn.MSELoss()
if (loss == 'BCELoss'):
    criterion = nn.BCEWithLogitsLoss()

optimizer = torch.optim.SGD(model.parameters(), lr = lr)

model.train()
for epoch in range(num_epochs):
    # Forward pass and loss
    train_predictions = model(train_features)
    print(train_predictions)
    print(train_targets)


    loss = criterion(train_predictions, train_targets)
    
    # Backward pass and update
    loss.backward()
    optimizer.step()

    # zero grad before new step
    optimizer.zero_grad()


    train_size = len(train_features)
    train_loss = criterion(train_predictions, train_targets).item() 
    pred = train_predictions.max(1, keepdim=True)[1] 
    correct = pred.eq(train_targets.view_as(pred)).sum().item()
    #train_loss /= train_size
    accuracy = correct / train_size
    print('\nTrain set: Loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        train_loss, correct, train_size,
        100. * accuracy))

However, when I print the loss, for some reason the loss already starts very low (around 0.6) before I have done any backwards pass! It remains this low all subsequent epochs. The prediction vector, however, looks like random garbage...

tensor([[-0.0447],
        [-0.0640],
        [-0.0564],
        ...,
        [-0.0924],
        [-0.0113],
        [-0.0774]], grad_fn=<PreluBackward>)
tensor([[0.],
        [0.],
        [0.],
        ...,
        [0.],
        [0.],
        [1.]])
epoch: 1, loss = 0.6842

I have no clue why is it doing that, and would appriciate any help. Thanks!

EDIT: I added the params if they can help anyone figuring this out:

if (dataset == 'adult_train.csv'):
    input_size=9
    print_every = 1
    output_size = 1
    lr = 0.001
    num_epochs = 10
    loss='BCELoss'

EDIT2: Added accuracy calculation in the middle block

jodag · Accepted Answer · 2020-12-31T15:36:36.933

1

BCELoss is not error.

The entropy of a Bernoulli distribution with p=0.5 is -ln(0.5) = 0.693. This is the loss you would expect if

Your data is evenly distributed
Your network is guessing randomly

or

Your network always predicts a uniform distribution

Your model is in the second case. The network is currently guessing slightly negative logits for every prediction. Those will be interpreted as 0 class predictions. Since it seems your data is imbalanced towards 0 labels your accuracy will be the same as a model that always predicts 0. This is just an artifact of random weight initialization. If you keep reinitializing your model you'll find that sometimes it will always predict 1 too.

edited Dec 31 '20 at 15:36

answered Dec 31 '20 at 15:20

jodag

19,885
5
47
66

So should I change the loss? What loss should I be using then? – Guy Dec 31 '20 at 15:35
BCELoss is fine. But you may want to balance it by weighting classes differently. Or change the dataset sampling to be weighted evenly. Why do you think the value of BCELoss should be higher than what you get? – jodag Dec 31 '20 at 15:36
I did some manipulations on the training data and removed enough rows so I am left with a file whose labels are 50% 0 and 50% 1. I added an accuracy print for each epoch and I always get "Accuracy: 7508/15016 (50%)" for the training data, which leads me to believe there's no learning process here (although it seems like the BCELoss is dropping, weirdly enough) . Any idea what to do? – Guy Dec 31 '20 at 19:05
Accuracy should be computed as any output that is < 0 implies class 0 and > 0 implies class 1. How are you computing accuracy? – jodag Dec 31 '20 at 19:45
I updated the original question (2nd code block) with the accuracy calculation that I am using – Guy Dec 31 '20 at 19:59
1

`pred = train_predictions.max(1, keepdim=True)[1]` should be `pred = (train_predicitions > 0).long()` as mentioned in the previous comment. – jodag Dec 31 '20 at 20:08
Thanks! It seems the accuracy is updated after all. – Guy Dec 31 '20 at 20:12

Why am I getting a low error before I did any optimization?

1 Answers1