I need to go (upsample) from a layer input = HxWxn1
, where H: height, W:width and n1: number of filters, to a layer output = 2Hx2Wxn2
, where 2H = 2*Height etc, and n2=n1/2
: number of new filters. One way of achieving this is by using transposed convolution operators. However, it is known that deconvolution (transposed convolution) operators can lead to the checkerboard artifacts. One way to overcome this problem is to perform resize and then apply a convolution map. E.g.
output = transpose_conv2d(input,n2,kernel!=(1,1),stride!=1)
vs
output = resize(input) # e.g. bilinear interpolation
output = conv2d(output,n2,kernel=(1,1),stride=1)
Obviously, in the second case, we are just changing the number of filters, and we are "not learning" any feature (we do not summarize information from input
layer). But this can be solved by another convolution with kernel size, kernel!= 1
. E.g.
output = resize(input)
output = conv2d(output,n2,kernel=(1,1),stride=1)
# appropriate padding
output = conv2d(output, n2, kernel != (1,1), ...)
Are there any practical differences (besides computational complexity) between the two approaches for upsampling? I understand that the latter solves the problem of checkerboard artifacts.