0

The main issue is I don't understand how the upsampling works. The AlexNet architecture has the following classifier

Dropout(),
Linear(in_features=9216, out_features=4096),
ReLU(),
Dropout(),
Linear(in_features=4096, out_features=4096),
ReLU(),
Linear(in_features=4096, out_features=1000)

The above is used on cifar so the num classes is 1000. If we change the classifier to use FCN (following FCN for Semantic Segmentation) and use VOC we'll have 20 classes (why does the figure show 21 channels?). Images are resized to 256x256. The architecture from the paper is shown below

Figure 1

Using pytorch I have the following

class AlexNetFCN(nn.Module):

    def __init__(self):
        super().__init__()
        self.features = AlexNet().features
        self.classifier = nn.Sequential(
                nn.Dropout(),
                nn.Conv2d(256, 4096, kernel_size=6),
                nn.ReLU(),
                nn.Dropout(),
                nn.Conv2d(4096, 4096, kernel_size=1),
                nn.ReLU(),
                nn.Conv2d(4096, 20, kernel_size=1),
                )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

I know after the feature portion my images are 7x7. How do i upsample back up to 256x256? Maybe nn.Bilinear but this isn't clear to me from the paper.

pmdaly
  • 1,142
  • 2
  • 21
  • 35

1 Answers1

0

You can use nn.Upsample layer, which is a wrapper around the nn.functional.interpolate function (old name is nn.functional.upsample)

Here's how it's used in a working alexnetfcn implementation:

https://github.com/monaj07/pytorch_semseg/blob/e93e96f811bcdb45c8c88380f561e87d0ccbf514/ptsemseg/models/fcn.py#L145

zwang
  • 357
  • 1
  • 8