0

I have defined my autoencoder in pytorch as following (it gives me a 8-dimensional bottleneck at the output of the encoder which works fine torch.Size([1, 8, 1, 1])):

self.encoder = nn.Sequential(
    nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4),
    nn.ReLU(),
    nn.Conv2d(32, 64, kernel_size=4, stride=2),
    nn.ReLU(),
    nn.Conv2d(64, 8, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.MaxPool2d(7, stride=1)
)

self.decoder = nn.Sequential(
    nn.ConvTranspose2d(8, 64, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.Conv2d(64, 32, kernel_size=4, stride=2),
    nn.ReLU(),
    nn.Conv2d(32, input_shape[0], kernel_size=8, stride=4),
    nn.ReLU(),
    nn.Sigmoid()
)

What I cannot do is train the autoencoder with

def forward(self, x):
    x = self.encoder(x)
    x = self.decoder(x)
    return x

The decoder gives me an error that the decoder cannot upsample the tensor:

Calculated padded input size per channel: (3 x 3). Kernel size: (4 x 4). Kernel size can't be greater than actual input size
Composer
  • 245
  • 1
  • 4
  • 17

2 Answers2

4

You are not upsampling enough via ConvTranspose2d, shape of your encoder is only 1 pixel (width x height), see this example:

import torch

layer = torch.nn.ConvTranspose2d(8, 64, kernel_size=3, stride=1)
print(layer(torch.randn(64, 8, 1, 1)).shape)

This prints your exact (3,3) shape after upsampling.

You can:

  • Make the kernel smaller - instead of 4 in first Conv2d in decoder use 3 or 2 or even 1
  • Upsample more, for example: torch.nn.ConvTranspose2d(8, 64, kernel_size=7, stride=2) would give you 7x7
  • What I would do personally: downsample less in encoder, so output shape after it is at least 4x4 or maybe 5x5. If you squash your image so much there is no way to encode enough information into one pixel, and even if the code passes the network won't learn any useful representation.
Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83
  • Thank you for an answer. The main limitation is that I would like to have an 8-dimensional representation of the image, hence the reduction to 1 pixel (I would have 8x pixels that I can use later as classifiers). I would like to have the images classified in 8 classes. P.S. I'm not earning money on this. It's for my own curiosity. Thank you. – Composer Sep 23 '19 at 20:50
  • I was thinking maybe 2x2x2 to get my 8 classes? Could that be better in your opinion? – Composer Sep 23 '19 at 20:53
  • I'm not sure what is your goal to be honest. Autoencoder's aim is to compress representation into most essential parts and uncompress it as lossless as possible. What's your input shape? Why do you want autoencoder for classification? Wouldn't normal approach benefit your more? – Szymon Maszke Sep 23 '19 at 20:56
  • My input shape is batch,1,84,84 from atari game in openai gym. I wanted to use autoencoder to classify the frames into 8 distinct classes. – Composer Sep 23 '19 at 20:57
  • Could I downsample in torch.Size([1, 2, 2, 2]) at the encoder output? what do you think? – Composer Sep 23 '19 at 20:59
  • I wouldn't use autoencoder at all, simple convolutional classifier with 8 classes should be enough. Autoencoder is for data compression, and the output of encoder is the hidden representation. Usually it's not too useful for classification itself. And usually, when you downsample, you increase number of channels, so more like `[batch, 64, 2, 2]` if you wish. And after the encoder you can add classifying head (e.g. Linear layers `[64 * 2 * 2, 64, 8]`, where the last linear is your 8 class classifier). – Szymon Maszke Sep 23 '19 at 21:07
  • 1
    Thank you very much for the suggestions. I will try it. – Composer Sep 23 '19 at 21:11
  • This wont work, I need unsupervised learning. Don't have target classes. – Composer Sep 23 '19 at 21:35
  • Okay, I get it. So on what basis do you want those 8 classes? What would they represent? – Szymon Maszke Sep 23 '19 at 21:37
  • Just unsupervised clustering. Grouping together similar instances. I have managed to implement something. I will post it. Thank you again. – Composer Sep 23 '19 at 22:38
  • More info on what I was attempting is found in https://stackoverflow.com/questions/40779282/can-i-use-autoencoder-for-clustering – Composer Sep 23 '19 at 22:47
2

I have managed to implement an autoencoder that provides an unsupervised clustering (in my case 8 classes)

This is not an expert solution. I owe thanks to @Szymon Maszke for the suggestions.

self.encoder = nn.Sequential(
    nn.Conv2d(1, 32, kernel_size=8, stride=4),
    nn.ReLU(),
    nn.Conv2d(32, 64, kernel_size=4, stride=2),
    nn.ReLU(),
    nn.Conv2d(64, 2, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.MaxPool2d(6, stride=1)
)

self.decoder = nn.Sequential(
    nn.ConvTranspose2d(2, 64, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.ConvTranspose2d(64, 32, kernel_size=8, stride=4),
    nn.ReLU(),
    nn.ConvTranspose2d(32, 1, kernel_size=8, stride=4)
)
Composer
  • 245
  • 1
  • 4
  • 17
  • What I was attempting: https://stackoverflow.com/questions/40779282/can-i-use-autoencoder-for-clustering – Composer Sep 23 '19 at 22:48