I implemented my PyTorch model with DataParallel for multi-GPU training. However, it seems that the model doesn't consistently output the right dimension. In the training loop, it seems that the model gave the correct output dimension for the first two batches, but it failed to do so for the third batch and caused an error when calculating the loss:
I also tried to use the solution from this post but it didn't help.