-1

Max-pooling is useful in vision for two reasons:

By eliminating non-maximal values, it reduces computation for upper layers.

It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.

Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.

I can't understand, what does 8 directions mean? And what does

"If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8."

mean?

Gauss
  • 379
  • 5
  • 18

1 Answers1

0

There are 8 directions in which one can translate the input image by a single pixel.

They are considering 2 horizontal, 2 vertical and 4 diagonal 1-pixel shifts. That gives 8 in total.

If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.

Imagine we are taking the maximum value in a 2x2 region of an image. The image is pre-convolved, though it doesn't matter for the purpose of this explanation.

No matter where exactly in a 2x2 region the maximum value resides, there will be 3 possible 1-pixel translations of the image that result in the maximum value remaining in that particular 2x2 region. Of course an even greater value may be brought from a neighbouring region, but that's beside the point. The point is you get some translation invariance.

With a 3x3 region it gets more complex, as the number of 1-pixel translations that keep the maximum value within the region depends on where exactly in the region that maximum value resides. The 5 translations they mention correspond to a location in the middle of an edge in a 3x3 pixel block. A corner location will give 3 translations, while the center one will give all 8.

Joseph Artsimovich
  • 1,499
  • 10
  • 13
  • I can't understand the relationship between pooling and translation invariance? Can you explain more about it? " there will be 3 possible 1-pixel translations of the image that result in the maximum value remaining in that particular 2x2 region", which 3 possible 1-pixel translation? And what does 1-pixel translation mean? – Gauss Apr 05 '17 at 04:14
  • @Gauss Suppose the maximum value in a 2x2 region was at coordinate (1, 1). Then, image translation of (-1, -1) will move that particular value to position (0, 0), which is still within the 2x2 region. Two other translations that leave the maximum within the 2x2 region are (0, -1) and (-1, 0). – Joseph Artsimovich Apr 05 '17 at 06:24
  • what does (-1,-1),(-1,0),(0,-1) mean? The four pixels (0,0),(0,1),(1,0),(1,1) merge into (1,1) -- the maximum vakue in 2*2 region? – Gauss Apr 05 '17 at 11:26
  • @Gauss These are translation vectors. `new_position = old_position + translation_vector`. In my example `(0, 0) = (1, 1) + (-1, -1)`. – Joseph Artsimovich Apr 05 '17 at 11:50
  • I still can't understand.The four pixels merge into 1 pixel, what dou you mean "1-pixel translations that keep the maximum value within the region", after translations, there is no previous region. And for 3*3 regions, I can't understand "The 5 translations they mention correspond to a location in the middle of an edge in a 3x3 pixel block. A corner location will give 3 translations, while the center one will give all 8." May be I don't get what you mean about 1-pixel translation. – Gauss Apr 05 '17 at 13:37
  • @Gauss You have a 2x2 patch of pixels. Suppose the bottom-right pixel in the patch has the highest value. Max-pooling returns that value for that region. Now assume you've translated (shifted) your image by one pixel diagonally, in top-left direction. Now the value in the bottom-right corner of a 2x2 patch moved to its top-left corner. Unless an even higher value was brought into the patch from the right or the bottom, max-pooling will still return the same value as before. – Joseph Artsimovich Apr 05 '17 at 13:53