Cross Entropy Calculation in PyTorch tutorial

Question

I'm reading the Pytorch tutorial of a multi-class classification problem. And I find the behavior of Loss calculation in Pytorch confuses me a lot. Can you help me with this?

The model used for classification goes like this:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

The training process goes as follows:

optimizer.zero_grad()
outputs = net(inputs)
loss = nn.CrossEntropyLoss(outputs, labels)
loss.backward()
optimizer.step()

My question is: What's the exact behavior of Loss calculation in Pytorch here? During each iteration, the input of nn.CrossEntropyLoss() has two parts:

The output of the model, which is a 10 by 1 tensor, with different values in it. This is a tensor without normalized into probability.
The label as a scalar, like 1 or 2 or 3.

As far as I know, the calculation of cross-entropy usually used between two tensors like:

Target as [0,0,0,1], where 1 is the right class
Output tensor as [0.1,0.2,0.3,0.4], where the sum as 1.

So based on this assumption, nn.CrossEntropyLoss() here needs to achieve:

Firstly normalize the output tensor into possibility one.
Encode the label into one-hot ones, like 2 in 5 class as [0,1,0,0,0]. The length must be the same as output tensor.
Then calculate the loss.

May I ask is this what nn.CrossEntropyLoss() does? Or do we need to one-hot encoding the true label before we input into the model?

Thank you a lot for your time in advance!

score 7 · Accepted Answer · answered Jun 02 '20 at 23:15

nn.CrossEntropyLoss first applies log-softmax (log(Softmax(x)) to get log probabilities and then calculates the negative-log likelihood as mentioned in the documentation:

This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

When using one-hot encoded targets, the cross-entropy can be calculated as follows:

where y is the one-hot encoded target vector and ŷ is the vector of probabilities for each class. To get the probabilities you would apply softmax to the output of the model. The logarithm of the probabilities is used, and PyTorch just combines the logarithm and the softmax into one operation nn.LogSoftmax(), for numerical stability.

Since all of the values except one in the one-hot vector are zero, only a single term of the sum will be non-zero. Therefore given the actual class, it can be simplified to:

As long as you know the class index, the loss can be calculated directly, making it more efficient than using a one-hot encoded target, hence nn.CrossEntropyLoss expects the class indices.

The full calculation is given in the documentation of nn.CrossEntropyLoss:

The loss can be described as:

Hi Michael, thank you a lot for your clear explanation! May I have a double-check with you? Once the `nn.CrossEntropyLoss` receives the label, it will automatically transfer it into the one-hot encoding tensor, where the position of the label is 1, else 0. For example, it receives the label 2, and it will transfer it into `[0,1,0,0]` (For a 4 categories problem). Is this correct? — Nick Nick Nick, Jun 03 '20 at 01:54
There is no one-hot encoding at all. All it does is index the probabilities. Assuming that `y_hat` are the log-probabilities of one sample with the label 2, it would just be `y_hat[2]`. — Michael Jungo, Jun 03 '20 at 03:06
OH!! Silly me, I understand it now! Thank you for your time, Michael! — Nick Nick Nick, Jun 03 '20 at 04:42

Cross Entropy Calculation in PyTorch tutorial

1 Answers1