Pytorch cross entropy input dimensions

Question

I'm trying to develop a binary classifier with Huggingface's BertModel and Pytorch. The classifier module is something like this:

class SSTClassifierModel(nn.Module):

  def __init__(self, num_classes = 2, hidden_size = 768):
    super(SSTClassifierModel, self).__init__()
    self.number_of_classes = num_classes
    self.dropout = nn.Dropout(0.01)
    self.hidden_size = hidden_size
    self.bert = BertModel.from_pretrained('bert-base-uncased')
    self.classifier = nn.Linear(hidden_size, num_classes)

  def forward(self, input_ids, att_masks,token_type_ids,  labels):
    _, embedding = self.bert(input_ids, token_type_ids, att_masks)
    output = self.classifier(self.dropout(embedding))
    return output

The way I train the model is as follows:

loss_function = BCELoss()
model.train()
for epoch in range(NO_OF_EPOCHS):
  for step, batch in enumerate(train_dataloader):
        input_ids = batch[0].to(device)
        input_mask = batch[1].to(device)
        token_type_ids = batch[2].to(device)
        labels = batch[3].to(device)
        # assuming batch size = 3, labels is something like:
        # tensor([[0],[1],[1]])
        model.zero_grad()        
        model_output = model(input_ids,  
                             input_mask, 
                             token_type_ids,
                             labels)
        # model output is something like: (with batch size = 3) 
        # tensor([[ 0.3566, -0.0333],
                 #[ 0.1154,  0.2842],
                 #[-0.0016,  0.3767]], grad_fn=<AddmmBackward>)

        loss = loss_function(model_output.view(-1,2) , labels.view(-1))

I'm doing the .view()s because of the Huggingface's source code for BertForSequenceClassification here which uses the exact same way to compute the loss. But I get this error:

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in binary_cross_entropy(input, target, weight, size_average, reduce, reduction)
   2068     if input.numel() != target.numel():
   2069         raise ValueError("Target and input must have the same number of elements. target nelement ({}) "
-> 2070                          "!= input nelement ({})".format(target.numel(), input.numel()))
   2071 
   2072     if weight is not None:

ValueError: Target and input must have the same number of elements. target nelement (3) != input nelement (6)

Is there something wrong with my labels? or my model's output? I'm really stuck here. The documentation for Pytorch's BCELoss says:

Input: (N,∗) where ∗ means, any number of additional dimensions
Target: (N,∗), same shape as the input

How should I make my labels the same shape as the model output? I feel like there's something huge that I'm missing but I can't find it.

Umang Gupta · Accepted Answer · 2020-04-03T20:14:28.593

Few observations:

The code that you refer to uses CrossEntropyLoss but you are using BCELoss.
CrossEntropyLoss takes prediction logits (size: (N,D)) and target labels (size: (N,)) whereas BCELoss takes p(y=1|x) (size: (N,)) and target labels (size: (N,)) as p(y=0|x) can be computed from p(y=1|x)
CrossEntropyLoss expects logits i.e whereas BCELoss expects probability value

Solution:

Since you pass an (N,2) tensor, it gives an error. You only need to pass p(y=1|x), therefore you can do

loss = loss_function(model_output.view(-1,2)[:,1] , labels.view(-1))

above I assumed that the second value is p(y=1|x).

A cleaner way would be to make model output only one value i.e p(y=1|x) and pass it to the loss function. It seems from the code that you are passing logit values and not probability values, so you may also need to compute sigmoid (model_output) if you want to use BCELoss or alternatively you can use BCEWithLogitsLoss.

Another alternative is to change the loss to CrossEntropyLoss that should work too as it can work for binary labels too.

Oh! Thank you. Both solved my problem. Except that the output of my model is not p(y=1|x) and p(y=0|x) and I should use a softmax for that! — P.Alipoor, Apr 03 '20 at 20:08
Yeah sorry, I realized a little late. I updated answer to reflect all three scenarios--- using CE, BCE and BCElogits loss — Umang Gupta, Apr 03 '20 at 20:10

Pytorch cross entropy input dimensions

1 Answers1