https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#conv3d Describes that the input to do convolution on 3D CNN is (N,Cin,D,H,W). Imagine if I have a sequence of images which I want to pass to 3D CNN. Am I right that:
- N -> number of sequences (mini batch)
- Cin -> number of channels (3 for rgb)
- D -> Number of images in a sequence
- H -> Height of one image in the sequence
- W -> Width of one image in the sequence
The reason why I am asking is that when I stack image tensors: a = torch.stack([img1, img2, img3, img4, img5])
I get shape of a torch.Size([5, 3, 396, 247])
, so is it compulsory to reshape my tensor to torch.Size([3, 5, 396, 247])
so that number of channels would go first or it does not matter inside the Dataloader?
Note that Dataloader would add one more dimension automatically which would correspond to N.