UNET architecture is like first half encoder and second half decoder . There are different variations of autoencoders like sparse , variational etc. They all compress and decompress the data But the UNET is also same used for compressing and decompressing . To my extent , I think that in simple autoencoders we do not use Transpose2D convolutions but in UNET we use this up sampling . In simple autoencoders , we don't use Transpose2D Conv . How up sampling happens and if we use Transpose2D in autoencoders , How it is different from UNET ?

- 11,317
- 4
- 27
- 36

- 530
- 5
- 27
-
Good resource: https://www.researchgate.net/post/Are_U-net_and_encoder-decoder_network_the_same – niek tuytel Nov 03 '21 at 15:07
3 Answers
Source by Przemyslaw-Dolata
I think there is an important difference between U-Nets and pure encoder-decoder networks.
In encoder-decoder nets there is exactly one latent space (L) with a nonlinear mapping from the input (X) to that space (E: X->L), and a corresponding mapping from that latent space to the output space (D: L->Y). There's a clear distinction between the encoder and decoder: the encoder changes representation of each sample into some "code" in the latent space, and the decoder is able to construct outputs given only such codes. This means you can take such a network apart and use the encoder and decoder separately, as done for example by Schlegl (2019).
In U-Nets however this is not the case. There, the output mapping also depends directly on the input space - instead of L->Y, there is [X+L]->Y (a "skip" connection). This means there are no real "encoder" and "decoder" parts, in the sense of mapping the sample onto some well-defined latent space and then computing the output from it. You cannot split a U-Net into parts and use them separately, as in order to compute the output, the input is needed as well - and all of its intermediate representations as well (since there are multiple latent spaces in the U-Net: X->L1->L2->...->Ln).

- 640
- 8
- 24
In autoencoders, the encoding part compresses the input linearly, which creates a bottleneck where not all features can be transmitted. On the other hand, U-Net does the deconvolution on the upsampling side and overcomes the bottleneck problem of lost features due to connections from the encoder side of the architecture (expansive path is symmetric to the contracting path). By doing this the upsampling part of U-Net contains large number of feature channels, which allow the network to propagate context information to higher resolution layers.

- 408
- 1
- 4
- 12
The bottleneck is not a problem and there is no overcoming... U-net is used for a specific task like segmentation where autoencoders are used for some other task like reconstructions, generation, denoising, and so on.

- 21
- 1