2

What does it mean by deconvolution or backwards convolution in convolutional neural nets?

I understand convolution, if we consider a 3x3 window W and a kernel k of same size the result of the convolution W*K will be one value. Here k is a matix with 3x3 elements.

In my understanding deconvolution trying to upsample feature maps to get a larger map. Does it use the same convolution matrix which is used to get the feature maps? If not how to calculate the gradients for backpropagation? A detail explanation would be very useful.

user570593
  • 3,420
  • 12
  • 56
  • 91

2 Answers2

3

A detailed explanation is well beyond the scope of StackOverflow; this is not a tutorial site.

In general, deconvolution is more of a reverse convolution: each pixel affects the 3x3 area from which it was extracted, applying the Fourier transform of the filter to reverse-engineer the input parameters. It's often used in signal processing to reduce noise, sharpen features, etc.

For example, visualize a dozen data points in the x-y plane, distributed more or less along a quadric curve. There is a variety of best-fit methods to map a 4th-degree equation -- or a rolling combination of cubics -- to the given points. This is a type of deconvolution.

Here are some references; I hope that one or two of them are at the level you need to move forward.

https://en.wikipedia.org/wiki/Deconvolution

https://www.reddit.com/r/MachineLearning/comments/454ksm/tutorial_on_deconvolution/

https://github.com/tensorflow/tensorflow/issues/2169#issuecomment-216607417

Prune
  • 76,765
  • 14
  • 60
  • 81
  • 1
    I'd like to add a resource that I have found to be very helpful to understand deconvolution (also called transposed convolution). In the 4th section of this paper https://arxiv.org/pdf/1603.07285.pdf "A guide to convolution arithmetic for deep learning" by Dumoulin they explain in a very intuitive way what is a transposed convolution. – Guillem Cucurull Apr 06 '17 at 10:52
  • 2
    This answer is wrong of misleading. "Deconvolution" in neural networks is a poor choice of a name and has nothing to do with actual deconvolution. More information here https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers. – papirrin Jan 06 '18 at 06:09
1

As pointed out by @papirrin, the answer given by @Prune is a bit misleading. In CNNs (or Fully Convolutional Neural Neworks, which is where Deconvolution is first proposed), the deconvolution is not exactly the reverse of convolution. More precisely, the deconvolution in CNNs only reverse the shape, but not the content. The name of deconvolution is misleading because deconvolution is already defined mathematically, hence, in the below, we will use transposed convolution to indicate the "deconvolution in CNNs".

To understand the transposed convolution, you will need to transform the filters of convolution operation into a matrix when performing the convolution operation. Then, the convolution operation can be defined as Y=WX. Then, in the transposed convolution, we basically transpose the matrix, and the output will be computed as Y=W^TX. For some examples, you can refer to https://tinynet.autoai.org/en/latest/induction/convolution.html and https://tinynet.autoai.org/en/latest/induction/convolution-transpose.html.

As for how to get the convolution matrix in transposed convolution, it depends on how you are going to use it. For image segmentation, it is learned during back propagation. In some visualizations of intermediate feature maps (for example, the ECCV14 paper: https://arxiv.org/abs/1311.2901), it is directly derived from the convolution operation. In summary, both ways are fine.

For how to compute the gradient, it is exactly the same as in convolution. You can also interpret the transposed convolution operation as it basically swap the forward and backward process of a convolution operation.

xzyao
  • 11
  • 1