4

I need to go (upsample) from a layer input = HxWxn1, where H: height, W:width and n1: number of filters, to a layer output = 2Hx2Wxn2, where 2H = 2*Height etc, and n2=n1/2: number of new filters. One way of achieving this is by using transposed convolution operators. However, it is known that deconvolution (transposed convolution) operators can lead to the checkerboard artifacts. One way to overcome this problem is to perform resize and then apply a convolution map. E.g.

output = transpose_conv2d(input,n2,kernel!=(1,1),stride!=1)

vs

output = resize(input) # e.g. bilinear interpolation
output = conv2d(output,n2,kernel=(1,1),stride=1)

Obviously, in the second case, we are just changing the number of filters, and we are "not learning" any feature (we do not summarize information from input layer). But this can be solved by another convolution with kernel size, kernel!= 1. E.g.

output = resize(input)
output = conv2d(output,n2,kernel=(1,1),stride=1)
# appropriate padding
output = conv2d(output, n2, kernel != (1,1), ...)

Are there any practical differences (besides computational complexity) between the two approaches for upsampling? I understand that the latter solves the problem of checkerboard artifacts.

Foivos
  • 545
  • 4
  • 13
  • I know it's been a while, but I came across this question and was interested. Did you ever manage to notice any differences? – Dieblitzen May 06 '19 at 20:22
  • 1
    I ended up going the second direction, where upsampling takes place with resize+convolution. There was not enough time to train all possible combinations and do a thorough comparison between transpose conv and resize+conv. I noticed slight improvement (in my limited tests) with resize+conv, but take it with a grain of salt. Here is the arch I ended up using: https://arxiv.org/abs/1904.00592 – Foivos May 08 '19 at 04:12

0 Answers0