1

I know that in Convolution layers the kernel size needs to be a multiplication of stride or else it will produce artefacts in gradient calculations like the checkerboard problem. Now does it also work like that in Pooling layers? I read somewhere that max pooling can also cause problems like that. Take this line in the discriminator for example:

  self.downsample = nn.AvgPool2d(3, stride=2, padding=1, count_include_pad=False)

I have a model (MUNIT) with it, and this is the image it produced:

enter image description here

It looks like the checkerboard problem, or at least a gradient problem but I checked my Convolution layers and didn't found the error described above. They all are of size 4 with stride 2 or an uneven size with stride of 1.

Jarartur
  • 149
  • 1
  • 10

1 Answers1

2

This doesn't look like a checkerboard artifact honestly. Also I don't think discriminator would be the problem, it's usually about image restoration (generator or decoder).

Took a quick look at the MUNIT and what they use in Decoder is torch.nn.Upsample with nearest neighbor upsampling (exact code line here).

You may try to use torch.nn.Conv2d followed by torch.nn.PixelShuffle, something like this:

import torch

in_channels = 32
upscale_factor = 2
out_channels = 16

upsampling = torch.nn.Sequential(
    torch.nn.Conv2d(
        in_channels,
        out_channels * upscale_factor * upscale_factor,
        kernel_size=3,
        padding=1,
    ),
    torch.nn.PixelShuffle(upscale_factor),
)

image = torch.randn(1, 32, 16, 16)

upsampling(image).shape  # [1, 16, 32, 32]

This allows neural network to learn how to upsample the image instead of merely using torch.nn.Upsample which the network has no control over (and using below trick it should also be free of checkerboard artifacts).

Additionally, ICNR initialization for Conv2d should also help (possible implementation here or here). This init scheme initializes weights to act similar to nearest neighbor upsampling at the beginning (research paper here).

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83