0

Let us assume a tensor like this:

x = tf.constant([[1., 2., 3.],
                  [4., 5., 6.],
                  [7., 8., 9.]])

To apply the average pooling function, I will do this:

x = tf.reshape(x, [1, 3, 3, 1])
avg_pool_2d = tf.keras.layers.AveragePooling2D(pool_size=(2, 2),strides=(2, 2), padding='same')
avg_pool_2d(x)

The result is:

<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[3. ],
         [4.5]],
        [[7.5],
         [9. ]]]], dtype=float32)>

I can follow the logic above:

(1+2+4+5)/4 = 3
(3+6)/2 = 4.5
(7+8)/2 = 7.5
(9/1) = 9

I think the logic is as follows: The pooling filter is usually situated inside the tensor to perform the pooling operator. But when the entire filter does not situate inside the tensor (see the below figure for an example), we need to specify the number of elements of the filter that are situated inside the tensor (a). The following figure illustrates the logic for a 4 by 3 tensor, with pooling filter and stride sizes of 2 by 2, and padding the same.

enter image description here

However, it is not always like this. For example, suppose the following tensor:

y = tf.constant([[1., 2., 3., 4., 5.],
                 [6., 7., 8., 9., 10.]])

Then, I do this:

y = tf.reshape(y, [1, 2, 5, 1])
avg_pool_2d = tf.keras.layers.AveragePooling2D(pool_size=(4, 4),strides=(4, 4), padding='same')
avg_pool_2d(y)

The result is like this:

    <tf.Tensor: shape=(1, 1, 2, 1), dtype=float32, numpy=
array([[[[4.5 ],
         [7.]]]], dtype=float32)>

If I wanted to follow the logic for the first example, I expected the result to be like this:

(1+2+3+4+6+7+8+9)/8 = 5
(5+10)/2 = 7.5

I am using TensorFlow 2.8.0. What mistake am I making?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
mdslt
  • 155
  • 1
  • 10
  • You didn't explain the logic for the first one, you just showed the calculation. I think the padding needs to be handled differently. You have a shape of 2, 5 (ignore the first index and the last index is your channels?) And you're applying a 4x4 filter with a stride of 4? You get two padded shapes. – matt May 05 '23 at 14:27
  • I'll add the logic to the original post. For the second example: (i) the tensor is 2 by 5, with one channel, (ii) I use a non-overlapped average pooling function with a pooling filter size of 4 by 4 and a stride of 4 by 4. – mdslt May 05 '23 at 14:32
  • To my understanding, the same padding ensures that we always include marginal values even if the filter goes outside of the input tensor. If I am understanding you correctly, I need to extend the tensor such that one can situate the filter into it. For the second example, the first pooling area includes `[[1, 2, 3, 4], [6, 7, 8, 9]]`. As you suggested, I should convert it: `[[1, 2, 3, 4], [6, 7, 8, 9], [1, 2, 3, 4], [6, 7, 8, 9]]`, which means `([1+2+3+4+6+7+8+9]+[1+2+3+4+6+7+8+9])/16`. This is again `5`, not `4.5` provided by TensorFlow. – mdslt May 05 '23 at 15:15
  • @matt, it is interesting that my logic always works when the pooling filter size is 2 by 2! I have a problem only when I increase it to 4 by 4. – mdslt May 05 '23 at 15:24
  • 1
    You need to find what the rules are for `padding="same"`. I have used it with convolutions, which means produce the same size image after convolution. If you want to narrow it down by guess and check you can run the filter 10 times changing each element to 1 and the rest to 0, then you'll know exactly what each element contributes to the final result. – matt May 05 '23 at 21:25

2 Answers2

1

If you look at the documentation, AveragePooling2D you can see that the padding

"same" results in padding evenly to the left/right or up/down of the input such that output has the same height/width dimension as the input.

Which says that the padding can be added to both sides of the shape. That means the first value is:

(1 + 2 + 3 + 6 + 7 + 8)/6

The second value is

(4 + 5 + 9 + 10)/4

You were assuming the first element would go from 1-4 but there is padding on both sides of the image. 1 at the beginning and 2 at the end. Hence the first value is calculated from 1-3 and the second value from 4-5.

In your 2x2 case, if it needs to be padded there will only be 1 row/column of padding and that appears to be always added at the end.

The documentation isn't clear though, so I wouldn't rely on this behavior.

matt
  • 10,892
  • 3
  • 22
  • 34
0

When the filter size is 2 by 2, it always pads on the right-hand side of the tensor. But when it is 4 by 4, it is possible to pad on both right/left and up/down. Please look at the following examples:

avg = tf.keras.layers.AveragePooling2D(pool_size=(4, 4),strides=(4, 4), padding='same')

Scenario 1:

    y = tf.constant([[1., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]])
    y = tf.reshape(y, [1, 2, 5, 1])
    avg(y)

Result:

    <tf.Tensor: shape=(1, 1, 2, 1), dtype=float32, numpy=
    array([[[[0.16666667],
             [0.        ]]]], dtype=float32)>

enter image description here

Scenario 2:

    y = tf.constant([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 1.]])
    y = tf.reshape(y, [1, 2, 5, 1])
    avg2(y)

Result:

    <tf.Tensor: shape=(1, 1, 2, 1), dtype=float32, numpy=
    array([[[[0.  ],
             [0.25]]]], dtype=float32)>

enter image description here

Scenario 3:

    y = tf.constant([[0., 0., 0., 1., 0.], [0., 0., 0., 0., 1.]])
    y = tf.reshape(y, [1, 2, 5, 1])
    avg2(y)

Result:

    <tf.Tensor: shape=(1, 1, 2, 1), dtype=float32, numpy=
    array([[[[0. ],
             [0.5]]]], dtype=float32)>

enter image description here

mdslt
  • 155
  • 1
  • 10