1

I'm trying to just apply maxpool2d (from torch.nn) on a single image (not as a maxpool layer). Here is my code right now:

name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))

The issue is, this code gives me a very green image as a result. I am sure the problem is with the dimensions of view, but I was unable to find how to apply maxpool to just one image so I couldn't fix it. The dimension of the image I'm considering is 512x512. The arguments for view make no sense for me right now, it's just the only number that gives a result...

If for example, I gave 512,512 as the argument for view, I get the following error:

RuntimeError: shape '[512, 512]' is invalid for input of size 261120

If anyone can tell me how to apply maxpool, avgpool, or minpool to an image and display the result I would be super grateful!

Thanks (:

1 Answers1

3

Assuming your image is a numpy.array upon loading (please see comments for explanation of each step):

import numpy as np
import torch

# Assuming you have 3 color channels in your image
# Assuming your data is in Width, Height, Channels format
numpy_img = np.random.randint(low=0, high=255, size=(512, 512, 3))

# Transform to tensor
tensor_img = torch.from_numpy(numpy_img)
# PyTorch takes images in format Channels, Width, Height
# We have to switch their dimensions using `permute`
tensor_img = tensor_img.permute(2, 0, 1)
tensor_img.shape # Shape [3, 512, 512]

# Layers always need batch as first dimension (even for one image)
# unsqueeze will add it for you    
ready_tensor_img = tensor_img.unsqueeze(dim=0)
ready_tensor_img.shape # Shape [1, 3, 512, 512]

pooling = torch.nn.MaxPool2d(kernel_size=3, stride=1)

# You need to cast your image to float as
# pooling is not implemented for Tensors of type long
new_img = pooling(ready_tensor_img.float())

If your image is black and white you would need shape [1, 1, 512, 512] (single channel only), you can't leave/squeeze those dimensions, they always have to be there for any torch.nn.Module!

To transform tensor into image again you could use similar steps:

# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)

# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape  # Shape: [510, 510, 3]

# Cast to numpy and you have your image
final_image = width_height_channels.numpy()
Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83
  • I ran the code (it worked!), but my result is a black image. This is because after you cast to long, each entry becomes 0. I checked and new_img gives me actual numbers, but no_batch gives zeros. Do we need to cast it to long? If so, how do you avoid all 0s? EDIT: I changed long to float and I get the right result, just to understand why did you choose long first? Thanks for all your help! –  Apr 06 '20 at 08:09
  • @tweepy_ques it depends what your original `img` is, I assumed it's of `int` type and has `[0-255]` range as you didn't provide this information. If it's a float in `[0,1]` range you __should not__ perform any casting. In the second step it once again depends how you want your image - `int` in range `[0, 255]` or `float` in range `[0,1]`. Also you may want different data format, e.g. `[Channels, Width, Height]` vs `[Width, Height, Channel]` with second being more popular (used in `tensorflow` for example), while the first is used by PyTorch. – Szymon Maszke Apr 06 '20 at 08:17