I'm quite new to Pytorch. I want to compute the loss of a batch in a Transformer. In this case my 'batch' has only two replicas. The batch outputs a Tensor of the shape 2,73,33:
output
tensor([[[ 21.1355, -7.5047, 2.8138, ..., -14.1462, -15.1999, -7.2595],...
output.size()
>>> torch.Size([2, 73, 33])
The target has the categorical solution for each position. It has the shape 2,73:
target.size()
>>> torch.Size([2, 73])
When I compute the loss, I get a value in the case I only compare the first replica:
loss = torch.nn.CrossEntropyLoss(ignore_index=1)
loss(output[0], target[0])
tensor(0.1967)
But it errors when I do it all at once:
loss = torch.nn.CrossEntropyLoss(ignore_index=1)
loss(output, target)
ValueError: Expected target size (2, 33), got torch.Size([2, 73])
Do I have to loop over the replicas and average them? Any help is much appreciated.