Understanding parameters of nn.NLLLoss function in this example

Question

I am following an example in a book. The example defines the nn.NLLLoss() function whose input is confusing me.

My model has the final step as the nn.LogSoftmax which gives me the below tensor output(I'm trying the example on a single image):

tensor([[-0.7909, -0.6041]], grad_fn=<LogSoftmaxBackward>)

The tensor has probabilities of whether the image is a bird or an airplane. The example defines 0 for a bird and 1 for an airplane.

Now while defining the loss function the example passes the above-mentioned tensor and the correct label for the image as input likewise:

loss = nn.NLLLoss()
loss( out,  torch.tensor([0])) #0 as the image is of a bird

I am unable to understand why are we passing the label of the image. My guess would be that the label specifies to the model which index of the probality to consider while calculating the loss. However, if that is the case why do we need to pass the label as a Tensor, we could just pass the label as an index to the out Tensor likewise:

loss( out[0, 0] ) # [0, 0] since out is a 2d Tensor

score 1 · Accepted Answer · answered Oct 20 '21 at 22:08

This is precisely what nn.NLLLoss does... actually it's the only thing it does! Its purpose is to index the prediction tensor using the ground-truth label, and return minus that quantity.

Let y_hat be the prediction tensor and y the target tensor, then nn.NLLLoss performs:

>>> -y_hat[torch.arange(len(y_hat)), y]

In your example, it comes down to -y_hat[0, 0] since the label for that particular instance is 0.

You can read related posts on the nn.NLLLoss:

Understanding parameters of nn.NLLLoss function in this example

1 Answers1