Pytorch submodules output shape

Question

How does the output shape of submodules in pytorch is determined? why is the output shape of a certain sub-module is modified in the code below?

When I separate the head of a classical classifier from its backbone in the following way:

import torch, torchvision
from torchsummary import summary

effnet = torchvision.models.efficientnet_b0(num_classes = 2)

backbone = torch.nn.Sequential(*(list(effnet.children())[0]))
adaptive_pool = list(effnet.children())[1]
head = list(effnet.children())[2]

model = torch.nn.Sequential(*[backbone, adaptive_pool, head])
summary(model, (3,256,256), device = 'cpu') # <== Error

I get the following error:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (2560x1 and 1280x2)

This error is due to modified output shape of the sub-module adaptive_pool. To workaround this problem, flatten can be used as follows:

class flatten(torch.nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

model = torch.nn.Sequential(*[backbone, adaptive_pool,  flatten(), head])
summary(model, (3,256,256), device = 'cpu')

Why is the output shape of the sub-module adaptive_pool is modified?

score 2 · Accepted Answer · edited Sep 28 '22 at 12:16

2

The output of an nn.AdaptiveAvgPool2d is 4D even if the average is computed globally i.e output_size=1. In other words, the output shape of your global pooling layer is (N, C, 1, 1). This means you indeed need to flatten it for the layer which is fully connected.

In the referenced original efficient net classification network, the implementation of the flattening operation is done directly in the forward logic without the use of a dedicated layer. See this line.

Instead of implementing your own flattening layer, you can use the built-in nn.Flatten. More details about this module can be found here.

>>> model = nn.Sequential(backbone, adaptive_pool, nn.Flatten(1), head)

edited Sep 28 '22 at 12:16

Nir

1,618
16
24

answered Sep 28 '22 at 10:18

Ivan

34,531
8
55
100

It seems like in the original `torchvision.models.efficientnet_b0` the adaptive pool has other output shape - because the original model obviously works without an additional flatten, what am I missing? – Nir Sep 28 '22 at 10:50
1

The flattening operation is done directly in the forward logic without the use of a dedicated layer. See [this line](https://github.com/pytorch/vision/blob/main/torchvision/models/efficientnet.py#L348). – Ivan Sep 28 '22 at 10:57

Pytorch submodules output shape

1 Answers1