3

Recently I've trained a neural network using pytorch and there is an average pooling layer with padding in it. And I'm confused about the behavior of it as well as the definition of average pooling with padding.

For example, if we have a input tensor:

[[1, 2, 3],
 [4, 5, 6],
 [7, 8, 9]]

When padding is one and kernel size 3, the input to the first kernel should be:

 0, 0, 0
 0, 1, 2
 0, 4, 5

The output from the pytorch is 12/4 = 3 (ignoring padded 0), but I think it should be 12/9 = 1.333

Can anyone explain this to me?

Much appreciated.

Shai
  • 111,146
  • 38
  • 238
  • 371
Wenbin Xu
  • 134
  • 2
  • 14

1 Answers1

2

It's basically up to you to decide how you want your padded pooling layer to behave.
This is why pytorch's avg pool (e.g., nn.AvgPool2d) has an optional parameter count_include_pad=True:
By default (True) Avg pool will first pad the input and then treat all elements the same. In this case the output of your example would indeed be 1.33.
On the other hand, if you set count_include_pad=False the pooling layer will ignore the padded elements and the result in your example would be 3.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • Thank you so much. So both 3 and 1.33 can be the right outcome depending on what we want? And about other platforms like Caffe, it seems that Caffe only support count_include_pad = False by looking at the source code? – Wenbin Xu Apr 18 '19 at 05:11