-1

I am having some issues with regards to the dimensionality of my tensors in my training function at present. I am using the MNIST dataset, so 10 possible targets, and originally wrote the prototype code using a training batch size of 10, which was in retrospect not the wisest choice. It gave poor results during some earlier tests, and increasing the amount of training iterations saw no benefit. Upon trying to then increase the batch size, I realised that what I had written was not that general, and I was likely never training it on the proper data. Below is my training function:

def Train(tLoops, Lrate):
    for _ in range(tLoops):
        tempData = train_data.view(batch_size_train, 1, 1, -1)
        output = net(tempData)
        trainTarget = train_targets
        criterion = nn.MSELoss()
        print("target:", trainTarget.size())
        print("Output:", output.size())
        loss = criterion(output, trainTarget.float())
        # print("error is:", loss)
        net.zero_grad()  # zeroes the gradient buffers of all parameters
        loss.backward()
        for j in net.parameters():
            j.data.sub_(j.grad.data * Lrate)

of which the print functions return

target: torch.Size([100])
Output: torch.Size([100, 1, 1, 10])

before the error message on the line where loss is calculated;

RuntimeError: The size of tensor a (10) must match the size of tensor b (100) at non-singleton dimension 3

The first print, target, is a 1-dimensional list of the respective ground truth values for each image. Output contains the output of the neural net for each of those 100 samples, so a 10 x 100 list, however from skimming and reshaping the data from 28 x 28 to 1 x 784 earlier, I seem to have extra dimensions unnecessarily. Does PyTorch provide a way to remove these? I couldn't find anything in the documentation, or is there something else that could be my issue?

Thefoilist
  • 137
  • 1
  • 11

1 Answers1

1

There are several problems in your training script. I will address each of them below.

  1. First, you should NOT do data batching by hand. Pytorch/torchvision have functions for that, use a dataset and a data loader: https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html.

  2. You should also NEVER update the parameters of you network by hand. Use an Optimizer: https://pytorch.org/docs/stable/optim.html. In your case, SGD without momentum will have the same effect.

  3. The dimensionality of your input seems to be wrong, for MNIST an input tensor should be (batch_size, 1, 28, 28) or (batch_size, 784) if you're training a MLP. Furthermore, the output of your network should be (batch_size, 10)

  • With regards to points 1 and 2, I am in fact already using the dataloader module in pytorch, which is doing the batching for me. Optimisation is something I know there is a library for that I intend to use in the future, this was more an excersise for myself in "getting to know" what is going on if that makes sense. you hit the nail on the head with your 3rd point, the dimensions should be trimmed using `torch.squeeze()`, and then `torch.permutate()` as necessary – Thefoilist Sep 26 '20 at 15:04