I'm reading the Pytorch tutorial of a multi-class classification problem. And I find the behavior of Loss calculation in Pytorch confuses me a lot. Can you help me with this?
The model used for classification goes like this:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
The training process goes as follows:
optimizer.zero_grad()
outputs = net(inputs)
loss = nn.CrossEntropyLoss(outputs, labels)
loss.backward()
optimizer.step()
My question is: What's the exact behavior of Loss calculation in Pytorch here? During each iteration, the input of nn.CrossEntropyLoss() has two parts:
- The output of the model, which is a 10 by 1 tensor, with different values in it. This is a tensor without normalized into probability.
- The label as a scalar, like 1 or 2 or 3.
As far as I know, the calculation of cross-entropy usually used between two tensors like:
- Target as [0,0,0,1], where 1 is the right class
- Output tensor as [0.1,0.2,0.3,0.4], where the sum as 1.
So based on this assumption, nn.CrossEntropyLoss() here needs to achieve:
- Firstly normalize the output tensor into possibility one.
- Encode the label into one-hot ones, like 2 in 5 class as [0,1,0,0,0]. The length must be the same as output tensor.
- Then calculate the loss.
May I ask is this what nn.CrossEntropyLoss() does? Or do we need to one-hot encoding the true label before we input into the model?
Thank you a lot for your time in advance!