3

I am trying to implement a paper on Semantic Segmentation and I am confused about how to Upsample the prediction map produced by my segmentation network to match the input image size.

For example, I am using a variant of Resnet101 as the segmentation network (as used by the paper). With this network structure, an input of size 321x321 (again used in the paper) produces a final prediction map of size 41x41xC (C is the number of classes). Because I have to make pixel-level predictions, I need to upsample it to 321x321xC. Pytorch provides function to Upsample to an output size which is a multiple of the prediction map size. So, I can not directly use that method here.

Because this step is involved in every semantic segmentation network, I am sure there should be a standard way to implement this.

I would appreciate any pointers. Thanks in advance.

kmario23
  • 57,311
  • 13
  • 161
  • 150
ethelion
  • 105
  • 6

1 Answers1

3

Maybe the simpliest thing you can try is:

  • upsample 8 times. Then you 41x41 input turns into 328x328
  • perform center cropping to get your desired shape 321x321 (for instance, something like this input[3:,3:,:-4,:-4])
Egor Lakomkin
  • 1,374
  • 14
  • 26
  • 1
    it might be more stable to crop after each upsample and not just once at the end. – Shai Nov 08 '17 at 12:11
  • Is there a well-accepted way to handle upsampling in semantic segmentation? I want to adopt the standard practices because I am trying to reproduce the results in a paper for ICLR Reproducibility challenge and they don't mention their upsampling strategy. Also, I want my network to upsample to any arbitrary input size so that I don't have to worry about applying transformation to images and labels at the inference time (where images can be of any size). Is there a way to do that? – ethelion Nov 14 '17 at 03:49
  • 1
    You can try deconvolution as an upsampling method also. But in practice just Nearest2D upsaming is working very well – Egor Lakomkin Nov 15 '17 at 10:12
  • Thanks Egor Lakomkin and Shai. 3 upsampling operations using bilinear interpolation with cropping after each upsampling works the best. I tried transposed convolution also, but learning the weights for them turned out to be hard. – ethelion Nov 18 '17 at 08:12