I'm implementing the basic architecture from this paper: https://arxiv.org/pdf/1705.08260.pdf in PyTorch.
It consists of an autoencoder and Spatial Transformer. Output of the autoencoder is fed into the ST or so to speak bilinear sampler along with the right image and output of this bilinear interpolation is used for calculating the L1 loss between left image and itself.
But there's a problem, I don't really think that this code will do what I want. Official docs of grid_sample function in PyTorch refers to the fact that grid must be in range -1 and 1, but the grid itself has maximum value bigger than 1. If this code is correct, then should I rewrite the line where the grid is normalized?
My first thoughts were to rewrite it like this: (grid / torch.max(grid) - 0.5) * 2
so the values are between -1 and 1, then I should delete the padding_mode argument because no values are exceeding the range.
If this is correct then let me know so I can be sure that this is the right path.
def bilinear_sampler(images, disps):
N, C, H, W = images.size()
mesh_x, mesh_y = np.array(np.meshgrid(np.linspace(0, 1, W),
np.linspace(0, 1, H),
indexing='xy'))
mesh_x, mesh_y = torch.from_numpy(mesh_x).cuda(), torch.from_numpy(mesh_y).cuda()
mesh_x = mesh_x.repeat(N, 1, 1).type_as(images)
mesh_y = mesh_y.repeat(N, 1, 1).type_as(images)
grid = torch.stack((mesh_x + disps.squeeze(), mesh_y), 3)
output = F.grid_sample(images, grid * 2 - 1, mode='bilinear',
padding_mode='zeros', align_corners=False)
return output