1

I want to know the feature map shape (height and width only, not depth) of deep CNN backbones (eg. resnets, inception-v3) for a given image size to generate the proper anchor boxes. Some tensorflow implementations use the output_stride option of ResNet but I don't see such an option in keras.applications module.

Right now, the only way that I know to find feature map sizes is to do a forward pass, which is an overkill.

By some trial and error, I've managed to derive some heuristic formulas for calculating the feature map shapes of VGG16 and ResNet50, which seems to be working well for now. But I can't validate they are 100% correct. However, I can't derive such a formula for InceptionV3 model.

def vgg16_output_shape(input_shape):
    x = np.asarray(input_shape).astype(int)
    return (x/32).astype(int)

def resnet50_output_shape(input_shape):
    x = np.asarray(input_shape).astype(int)
    return np.ceil(
        np.ceil(
            np.ceil(
                np.ceil(
                    np.ceil(x / 2) / 2
                ) / 2
            ) / 2
        ) / 2
    ).astype(int)

def inceptionv3_output_shape(input_shape):
    ...

So I wanted to know if anyone knows a way to calculate the output size of a deep CNN without full forward pass?

P.S.: I know how to calculate the output shape of one conv layer. Since there are hundreds of conv layers in deep CNNs, I would rather do a forward pass than derive the formula using individual conv layers. If anyone has done it already for, say, InceptionV3, that would be great!

munikarmanish
  • 352
  • 2
  • 3
  • 13

0 Answers0