Given a feature map of dimensionality MxNxC
(for example, the output of a predicted Region of Interest from a Faster-RCNN), how would one reduce the spatial dimensions to be 1x1xC
? I.e. reduce the feature map to be a vector like quantity summarizing the features of the region?
I am aware of the 1x1
Convolution, however this seems to be relevant in the channel reduction case. Average and Max Pooling also are commonly used, however it seems that these approaches are better suited to a less extreme subsampling case.
Obviously one may simply compute the mean over the spatial dimensions, however this seems rather coarse.