2

I'm quite new to Pytorch. I want to compute the loss of a batch in a Transformer. In this case my 'batch' has only two replicas. The batch outputs a Tensor of the shape 2,73,33:

output
tensor([[[ 21.1355,  -7.5047,   2.8138,  ..., -14.1462, -15.1999,  -7.2595],...
output.size()
>>> torch.Size([2, 73, 33])

The target has the categorical solution for each position. It has the shape 2,73:

target.size()
>>> torch.Size([2, 73])

When I compute the loss, I get a value in the case I only compare the first replica:

loss = torch.nn.CrossEntropyLoss(ignore_index=1)
loss(output[0], target[0])
tensor(0.1967)

But it errors when I do it all at once:

loss = torch.nn.CrossEntropyLoss(ignore_index=1)
loss(output, target)
ValueError: Expected target size (2, 33), got torch.Size([2, 73])

Do I have to loop over the replicas and average them? Any help is much appreciated.

katze
  • 43
  • 7
  • Why do you output 73x33 when you have a 73x1 target? When you want to get rid of the 33 (third dim of your output) maybe try ```ignore_index=2``` – Theodor Peifer Apr 20 '21 at 10:15
  • ignore_index refers to the indices to ignore in the target (because the inputs have different lengths, they are padded to the same length using a padding token (which is in this case the number 1) – katze Apr 20 '21 at 10:26

0 Answers0