0

As mentioned above, both

tf.nn.conv2d with strides = 2

and

tf.nn.max_pool with 2x2 pooling

can reduce the size of input to half, and I know the output may be different, but what I don't know is that affect the final training result or not, any clue about this, thanks.

hyang51
  • 75
  • 1
  • 9
  • c comprehensive analysis is given here, the effect of pooling vs conv2d with stride https://arxiv.org/pdf/1706.01983.pdf – Ishant Mrinal Aug 09 '17 at 15:29

2 Answers2

2

In both your examples assume we have a [height, width] kernel applied with strides [2,2]. That means we apply the kernel to a 2-D window of size [height, width] on the 2-D inputs to get an output value, and then slide the window over by 2 either up or down to get the next output value.

In both cases you end up with 4x fewer outputs than inputs (2x fewer in each dimension) assuming padding='SAME'

The difference is how the output values are computed for each window:

conv2d

  • the output is a linear combination of the input values times a weight for each cell in the [height, width] kernel
  • these weights become trainable parameters in your model

max_pool

  • the output is just selecting the maximum input value within the [height, width] window of input values
  • there is no weight and no trainable parameters introduced by this operation
j314erre
  • 2,737
  • 2
  • 19
  • 26
0

The results of the final training could actually be different as the convolution multiplies the tensor by a filter, which you might not want to do as it takes up extra computational time and also can overfit your model as it will have more weights.