Shape of image after MaxPooling2D with padding ='same' --calculating layer-by-layer shape in convolution autoencoder

Question

Very briefly my question relates to image-size not remaining the same as the input image size after a maxpool layer when I use padding = 'same' in Keras code. I am going through the Keras blog: Building Autoencoders in Keras. I am building Convolution autoencoder. The autoencoder code is as follows:

input_layer = Input(shape=(28, 28, 1))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

As per autoencoder.summary(), the image output after the very-first Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer) layer is 28 X 28 X 16 ie the same as input image size. This is because padding is 'same'.

In [49]: autoencoder.summary()
(Numbering of layers is given by me and not produced in output)
_________________________________________________________________
  Layer (type)                 Output Shape             Param #   
=================================================================
1.input_1 (InputLayer)         (None, 28, 28, 1)         0         
_________________________________________________________________
2.conv2d_1 (Conv2D)            (None, 28, 28, 16)        160       
_________________________________________________________________
3.max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16)        0         
_________________________________________________________________
4.conv2d_2 (Conv2D)            (None, 14, 14, 8)         1160      
_________________________________________________________________
5.max_pooling2d_2 (MaxPooling2 (None, 7, 7, 8)           0         
_________________________________________________________________
6.conv2d_3 (Conv2D)            (None, 7, 7, 8)           584       
_________________________________________________________________
7.max_pooling2d_3 (MaxPooling2 (None, 4, 4, 8)           0         
_________________________________________________________________
8.conv2d_4 (Conv2D)            (None, 4, 4, 8)           584       
_________________________________________________________________
9.up_sampling2d_1 (UpSampling2 (None, 8, 8, 8)           0         
_________________________________________________________________
10.conv2d_5 (Conv2D)            (None, 8, 8, 8)           584       
_________________________________________________________________
11.up_sampling2d_2 (UpSampling2 (None, 16, 16, 8)         0         
_________________________________________________________________
12.conv2d_6 (Conv2D)            (None, 14, 14, 16)        1168      
_________________________________________________________________
13.up_sampling2d_3 (UpSampling2 (None, 28, 28, 16)        0         
_________________________________________________________________
14.conv2d_7 (Conv2D)            (None, 28, 28, 1)         145       
=================================================================

Next layer (layer 3) is, MaxPooling2D((2, 2), padding='same')(x). The summary() shows the output image size of this layer as, 14 X 14 X 16. But padding in this layer is also 'same'. So how come output image-size does not remain as 28 X 28 X 16 with padded zeros?

Also, it is not clear as to how the output shape has changed to (14 X 14 X 16) after layer 12, when input shape coming from above its earlier layer is (16 X 16 X 8).

`

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

Next layer (layer 3) is, MaxPooling2D((2, 2), padding='same')(x). The summary() shows the output image size of this layer as, 14 X 14 X 16. But padding in this layer is also 'same'. So how come output image-size does not remain as 28 X 28 X 16 with padded zeros?

There seems to be misunderstanding of what padding does. Padding just takes care of corner cases (what to do next to the boundary of the image). But you have 2x2 maxpooling operation, and in Keras the default stride equals to the pooling size, so stride=2, which halves the image size. You need to specify stride=1 by hand to avoid that. From Keras doc:

pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.

strides: Integer, tuple of 2 integers, or None. Strides values. If None, it will default to pool_size.

For the second question

Also, it is not clear as to how the output shape has changed to (14 X 14 X 16) after layer 12, when input shape coming from above its earlier layer is (16 X 16 X 8).

Layer 12 does not have padding=same specified.

"Layer 12 does not have padding=same specified." Do you know why exactly ? I understand that it produces the right output shape, but how do you know in advance what UpSampling2D layers should have "same" padding and which not ? In the case where the number of layers in not known in advance, it is not clear to me. — Florent F, Jan 15 '19 at 14:05

Shape of image after MaxPooling2D with padding ='same' --calculating layer-by-layer shape in convolution autoencoder

1 Answers1

Linked