1

I am confused how we define max-pooling in Tensorflow. The documentation is vague and does not explain the parameters well.
In the pooling documentation it only says:

ksize: A list of ints that has length >= 4. The size of the window for each dimension of the input tensor. strides: A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.

and

Each pooling op uses rectangular windows of size ksize separated by offset strides. For example, if strides is all ones every window is used, if strides is all twos every other window is used in each dimension, etc.

What is the equivalent of the following Caffe's max-pooling in Tensorflow?

layer {
  name: "pool"
  type: "Pooling"
  bottom: "relu"
  top: "pool"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

I'm not sure whether they mean overlapping pooling by all one strides [1,1,1,1] and non-overlapping [2,2,2,2] by saying

if strides is all ones every window is used, if strides is all twos every other window is used in each dimension, etc.

Hossein
  • 24,202
  • 35
  • 119
  • 224

1 Answers1

1

To do max-pooling in Tensor-Flow use:

tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None)

where ksize defines the window used for max-pooling. Note that you must specify the window size for each dimension of your input. This is the biggest difference to caffe, where caffe does all the dimension calculations for you. Note that you may have varying dimensions depending on your number of outputs that come from the previous convolutional layer.

Stride has still the same effect has in caffe ("skipping" the inputs. However you must specify the stride again for each dimension of the input.

The dimensions are at least 4 or larger.

See the documentation here:

https://www.tensorflow.org/api_docs/python/nn/pooling

Kev1n91
  • 3,553
  • 8
  • 46
  • 96
  • I dont get it! most of what you said here, is stated in tensorflow documentation as well, and thats vague to me. For example What does `ksize=[1,2,2,1]` and `strides = [1,1,1,1]` mean? `tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool1')` what are the first and last digits for? – Hossein Jan 20 '17 at 15:28
  • I think the first one is the batch size, followed by either channel, width, height or some combination of it – Kev1n91 Feb 13 '17 at 13:28