what's the difference between tf.nn.conv2d with strides = 2 and tf.nn.max_pool with 2x2 pooling?

Question

As mentioned above, both

tf.nn.conv2d with strides = 2

and

tf.nn.max_pool with 2x2 pooling

can reduce the size of input to half, and I know the output may be different, but what I don't know is that affect the final training result or not, any clue about this, thanks.

c comprehensive analysis is given here, the effect of pooling vs conv2d with stride https://arxiv.org/pdf/1706.01983.pdf — Ishant Mrinal, Aug 09 '17 at 15:29

score 2 · Accepted Answer · answered Aug 10 '17 at 00:17

In both your examples assume we have a [height, width] kernel applied with strides [2,2]. That means we apply the kernel to a 2-D window of size [height, width] on the 2-D inputs to get an output value, and then slide the window over by 2 either up or down to get the next output value.

In both cases you end up with 4x fewer outputs than inputs (2x fewer in each dimension) assuming padding='SAME'

The difference is how the output values are computed for each window:

conv2d

the output is a linear combination of the input values times a weight for each cell in the [height, width] kernel
these weights become trainable parameters in your model

max_pool

the output is just selecting the maximum input value within the [height, width] window of input values
there is no weight and no trainable parameters introduced by this operation

score 0 · Answer 2 · answered Aug 09 '17 at 16:37

0

The results of the final training could actually be different as the convolution multiplies the tensor by a filter, which you might not want to do as it takes up extra computational time and also can overfit your model as it will have more weights.

answered Aug 09 '17 at 16:37

Alex Mitrakow

1

what's the difference between tf.nn.conv2d with strides = 2 and tf.nn.max_pool with 2x2 pooling?

2 Answers2