1

Problem definition:

I have to use MSELoss function to define the loss to classification problem. Therefore it keeps saying the error message regarding the shape of tensor.

Entire error message:

torch.Size([32, 10]) torch.Size([32]) --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) in 53 output = model.forward(images) 54 print(output.shape, labels.shape) ---> 55 loss = criterion(output, labels) 56 loss.backward() 57 optimizer.step()

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs) 530 result = self._slow_forward(*input, **kwargs) 531 else: --> 532 result = self.forward(*input, **kwargs) 533 for hook in self._forward_hooks.values(): 534 hook_result = hook(self, input, result)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, input, target) 429 430 def forward(self, input, target): --> 431 return F.mse_loss(input, target, reduction=self.reduction) 432 433

/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py in mse_loss(input, target, size_average, reduce, reduction) 2213
ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
2214 else: -> 2215 expanded_input, expanded_target = torch.broadcast_tensors(input, target) 2216 ret = torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction)) 2217 return ret

/opt/conda/lib/python3.7/site-packages/torch/functional.py in broadcast_tensors(*tensors) 50 [0, 1, 2]]) 51 """ ---> 52 return torch._C._VariableFunctions.broadcast_tensors(tensors) 53 54

> RuntimeError: The size of tensor a (10) must match the size of tensor b (32) at non-singleton dimension 1

How can I reshape the tensor, and which tensor (output or labels) should I change to calculate the loss?

Entire code is attached below.

import numpy as np
import torch

# Loading the Fashion-MNIST dataset
from torchvision import datasets, transforms

# Get GPU Device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('MNIST_data/', download = True, train = True, transform = transform)
testset = datasets.FashionMNIST('MNIST_data/', download = True, train = False, transform = transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size = 32, shuffle = True, num_workers=4)
testloader = torch.utils.data.DataLoader(testset, batch_size = 32, shuffle = True, num_workers=4)

# Examine a sample
dataiter = iter(trainloader)
images, labels = dataiter.next()

# Define the network architecture
from torch import nn, optim
import torch.nn.functional as F

model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 10),
                      nn.LogSoftmax(dim = 1))
model.to(device)

# Define the loss
criterion = nn.MSELoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr = 0.001)

# Define the epochs
epochs = 5

train_losses, test_losses = [], []

for e in range(epochs):
  running_loss = 0
  for images, labels in trainloader:
    # Flatten Fashion-MNIST images into a 784 long vector
    images = images.to(device)
    labels = labels.to(device)
    images = images.view(images.shape[0], -1)

    # Training pass
    optimizer.zero_grad()
    output = model.forward(images)
    print(output.shape, labels.shape)
    loss = criterion(output, labels)
    loss.backward()
    optimizer.step()

    running_loss += loss.item()
  else:
    test_loss = 0
    accuracy = 0

    # Turn off gradients for validation, saves memory and computation
    with torch.no_grad():
      # Set the model to evaluation mode
      model.eval()

      # Validation pass
      for images, labels in testloader:
        images = images.to(device)
        labels = labels.to(device)
        images = images.view(images.shape[0], -1)
        ps = model(images)
        test_loss += criterion(ps, labels)
        top_p, top_class = ps.topk(1, dim = 1)
        equals = top_class == labels.view(*top_class.shape)
        accuracy += torch.mean(equals.type(torch.FloatTensor))

    model.train()

    print("Epoch: {}/{}..".format(e+1, epochs),
          "Training loss: {:.3f}..".format(running_loss/len(trainloader)),
          "Test loss: {:.3f}..".format(test_loss/len(testloader)),
          "Test Accuracy: {:.3f}".format(accuracy/len(testloader)))
boralim
  • 45
  • 10

2 Answers2

2

From the output you print before it error, torch.Size([32, 10]) torch.Size([32]).

The left one is what the model gives you and the right one is from trainloader, normally you use this for something like nn.CrossEntropyLoss.

And from the full error log, the error is from this line

loss = criterion(output, labels)

The way to make this work is called One-hot Encoding, if it's me for sake of my laziness I'll write it like this.

ones = torch.sparse.torch.eye(10).to(device)  # number of class class
labels = ones.index_select(0, labels)
Natthaphon Hongcharoen
  • 2,244
  • 1
  • 9
  • 23
  • Thank you for your kind explanation. However, it gives me another hardship saying "Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select" This error if from the line "labels = ones.index_select(0, labels)" – boralim Jun 15 '20 at 08:32
  • I edited the answer, actually just put `to(device)` after the `ones =`. See if that's work. – Natthaphon Hongcharoen Jun 15 '20 at 08:49
0

Alternatively, you can change your loss function from nn.MSELoss() to nn.CrossEntropyLoss(). Cross entropy loss is generally preferable to MSE for categorical tasks like this, and in PyTorch's implementation this loss function takes care of a lot of the shape conversion under the hood so you can provide it with a vector of class probabilities and a single class label.

Fundamentally, your model attempts to predict what class the input belongs to by calculating a score (you might call it a 'confidence score') for each possible class. So if you have 10 classes, the model's output will be a 10-dimensional list (in PyTorch, a tensor shape [10]) and the prediction would be the the index of the highest score. Often one would apply the softmax (https://en.wikipedia.org/wiki/Softmax_function) function to convert these scores to a probability distribution, so all scores will be between 0 and 1 and the elements all sum to 1.

Then cross entropy is a common choice of loss function for this task: it compares the list of predictions to the one-hot encoded label. E.g. if you have 3 classes, a label would look like [1, 0, 0] to represent the first class. This is also called the "one-hot encoding". Meanwhile a prediction might look like [0.7, 0.1, 0.2]. In PyTorch, nn.CrossEntropyLoss() expects your labels are coming as single value tensors whose value represents the class label, since there's no real need to move long, sparse vectors around memory. So this loss function accomplishes the comparison you want to do and I'm guessing is implemented more efficiently than actually creating one-hot encodings.

BBrooklyn
  • 350
  • 1
  • 3
  • 15