It looks like using torch.nn.DataParallel
changes the output size.
Though in official docs https://pytorch.org/docs/stable/nn.html#torch.nn.DataParallel
all information about size changing is as follows:
When module returns a scalar (i.e., 0-dimensional tensor) in forward(), this wrapper will return a vector of length equal to number of devices used in data parallelism, containing the result from each device.
My module returns tensor of 10 coordinates, and I have 2 GPU's where I want to run the code. The last layer of my CNN is nn.Linear(500, 10)
.
import torch
import torch.nn as nn
net = LeNet() #CNN-class written above
device = torch.device("cuda:0")
net.to(device)
net = nn.DataParallel(net)
#skipped some code, where inputs and targets are loaded from files
output = net(input)
criterion = nn.SmoothL1Loss()
loss = criterion(output, target)
Note that without calling DataParallel
this piece of code works okay. With DataParallel
the runtime error occurs when trying to calculate loss.
RuntimeError: The size of tensor a (20) must match the size of tensor b (10) at non-singleton dimension 0
Seems like output size for each GPU separately is 10 as stated, but afterwards the two outputs are joined and that's where size 20 come from.
When changing output size in the CNN-class from 10 to 5 it started to work again, but I'm not sure it's right solution and the CNN will work properly.