0

I went through the documentation of tf.nn.max_pool_with_argmax where it is written

Performs max pooling on the input and outputs both max values and indices.

The indices in argmax are flattened, so that a maximum value at position [b, y, x, c] becomes flattened index ((b * height + y) * width + x) * channels + c.

The indices returned are always in [0, height) x [0, width) before flattening, even if padding is involved and the mathematically correct answer is outside (either negative or too large). This is a bug, but fixing it is difficult to do in a safe backwards compatible way, especially due to flattening.

The variables b, y, x and c haven't been explicitly defined hence I was having issues implementing this method. Can someone please provide the same.

Anubhav Pandey
  • 1,285
  • 1
  • 14
  • 18

1 Answers1

0

I am unable to comment due to reputation.

But I think the variables are referencing the position and size of the Max Pooling window. x and y are the x and y position of the kernel as it moves along the input matrix and b and c are the width and height of the kernel. You would set b and c in kernel size.

If you are having a problem implementing max pooling with argmax it has little to do with these variables. You might want to specify the issue you are having with Max Pooling.

James Kl
  • 177
  • 9
  • As you would have seen these values are returned in flattened format. So to extract them I would have to apply some math like: c = ((returned_value) % channels), and so on, after doing this I had to use the co-ordinates of the pixel obtained to perform a clustering operation. Now my problem is, whether x and y are the co-ordinates of the pixel in the original image or (b,c) are the co-ordinates. I went through the github source code and still cannot find what b,c,x and y are. – Anubhav Pandey Dec 25 '18 at 05:18