Using PyTorch grid_sample to reconstruct left image from right image and inverse depth

Question

I'm implementing the basic architecture from this paper: https://arxiv.org/pdf/1705.08260.pdf in PyTorch.

It consists of an autoencoder and Spatial Transformer. Output of the autoencoder is fed into the ST or so to speak bilinear sampler along with the right image and output of this bilinear interpolation is used for calculating the L1 loss between left image and itself.

But there's a problem, I don't really think that this code will do what I want. Official docs of grid_sample function in PyTorch refers to the fact that grid must be in range -1 and 1, but the grid itself has maximum value bigger than 1. If this code is correct, then should I rewrite the line where the grid is normalized?

My first thoughts were to rewrite it like this: (grid / torch.max(grid) - 0.5) * 2 so the values are between -1 and 1, then I should delete the padding_mode argument because no values are exceeding the range.

If this is correct then let me know so I can be sure that this is the right path.

def bilinear_sampler(images, disps):
    N, C, H, W = images.size()

    mesh_x, mesh_y = np.array(np.meshgrid(np.linspace(0, 1, W),
                                          np.linspace(0, 1, H),
                                          indexing='xy'))
    mesh_x, mesh_y = torch.from_numpy(mesh_x).cuda(), torch.from_numpy(mesh_y).cuda()

    mesh_x = mesh_x.repeat(N, 1, 1).type_as(images)
    mesh_y = mesh_y.repeat(N, 1, 1).type_as(images)

    grid = torch.stack((mesh_x + disps.squeeze(), mesh_y), 3)

    output = F.grid_sample(images, grid * 2 - 1, mode='bilinear', 
                  padding_mode='zeros', align_corners=False)
    return output

score 0 · Answer 1 · answered May 07 '22 at 03:59

grid_sample only requires most values to be in range [-1, 1]. Since two views of stereo camera cover different spaces, some projected pixels from right to left could fall outside of the left image, and vice versa. In this case, it is reasonable to just ignore these outliers, or mask them out in the metrics. My advice is, you should not normalize the grid value again if most of them is in [-1, 1] already, and make use of proper padding_mode.

Using PyTorch grid_sample to reconstruct left image from right image and inverse depth

1 Answers1