Let's say input to intermediate CNN layer is of size 512×512×128 and that in the convolutional layer we apply 48 7×7 filters at stride 2 with no padding. I want to know what is the size of the resulting activation map?
I checked some previous posts (e.g., here or here) to point to this Stanford course page. And the formula given there is (W − F + 2P)/S + 1 = (512 - 7)/2 + 1, which would imply that this set up is not possible, as the value we get is not an integer.
However if I run the following snippet in Python 2.7, the code seems to suggest that the size of activation map was computed via (512 - 6)/2, which makes sense but does not match the formula above:
>>> import torch
>>> conv = torch.nn.Conv2d(in_channels=128, out_channels=48, kernel_size=7, stride=2, padding=0)
>>> conv
Conv2d(128, 48, kernel_size=(7, 7), stride=(2, 2))
>>> img = torch.rand((1, 128, 512, 512))
>>> out = conv(img)
>>> out.shape
(1, 48, 253, 253)
Any help in understanding this conundrum is appreciated.