MaxPool2D downsamples its input along its spatial dimensions (height and width) by taking the maximum value over an input window (of size defined by pool_size) for each channel of the input. For example, if I apply 2x2 MaxPooling2D
on this array:
array = np.array([
[[5],[8]],
[[7],[2]]
])
Then the result would be 8, which is the maximum value of an element in this array.
Another example, if I apply a 2x2 MaxPooling2D
on this array:
array = tf.constant([[[1.], [2.], [3.]],
[[4.], [5.], [6.]],
[[7.], [8.], [9.]]])
Then the output would be this:
([
[[5.], [6.]],
[[8.], [9.]]
])
What MaxPooling2D
did here is that it slided a 2x2 window and took the maximum value of it, resulting in halving the dimension of the input array along both its height and width. If you still have any problem how this works, check this from keras and this from SO
Now that it is clear that MaxPool2D
downsamples the input, let's get back to your question-
Why is a 2x2 MaxPooling used everywhere and not 3x3 or 4x4?
Well, the reason is that it reduces the data, applying a 3x3 MaxPooling2D
on a matrix of shape (3,3,1) would result in a (1,1,1) matrix, and applying a 2x2 MaxPooling2D
on a matrix of shape (3,3,1) would result in a (2,2,1) matrix. Obviously (2,2,1) matrix can keep more data than a matrix of shape (1,1,1). Often times, applying a MaxPooling2D
operation with a pooling size of more than 2x2 results in a great loss of data, and so 2x2 is a better option to choose. This is why, you see 2x2 MaxPooling2D
'everywhere', like in ResNet50, VGG16 etc.