0

I'm building a convolutional autoencoder, but want the encoding to be in a linear form so I can more easily feed it as input into an MLP. I have two convolutional layers on the encoder along with a linear inner layer to reduce dimension. This encoding is then fed into the corresponding decoder.

When I flatten the output of the second convolutional layer, based on my calculation (using the standard formula: Calculate the Output size in Convolution layer) should come out to a 1x100352 rank 1 tensor. However, when I set the input dimension of the linear layer to be 100352, the flattened rank 1 tensor has dimension 1x50176. Then comes the weird part.

I tried changing the input dimension of the linear layer to be 50176, assuming I had miscalculated. When I do this, the reshaped rank 1 tensor confusingly becomes 1x100352, and then the aforementioned weight matrix becomes 50176x256 as expected.

This response to modifying the linear layer's input dimension doesn't make sense to me. That hyperparameter controls the weight matrix correctly, but I guess I'm uncertain why it has any bearing on the linear layer's input since that's just a reshaped tensor output from a convolutional layer whose hyperparameters are unrelated to the hyperparameter in question.

I apologize if I'm just missing something obvious. I'm very new to pytorch, and I couldn't find any other posts which discussed this sort of issue.

Here's what I believe to be the minimal reproducible example:

import os
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.utils import save_image

class convAutoEncoder(nn.Module):
        def __init__(self,**kwargs):
                super().__init__()
                #Creating network structure
                #Encoder portion of autoencoder
                self.enc1 = nn.Conv2d(in_channels = kwargs["inputChannels"], out_channels = kwargs["channelsEncoderMid"], kernel_size = kwargs["kernelSize"])
                self.enc2 = nn.Conv2d(in_channels = kwargs["channelsEncoderMid"], out_channels = kwargs["channelsEncoderInner"], kernel_size = kwargs["kernelSize"])
                self.enc3 = nn.Linear(in_features = kwargs["intoLinear"], out_features = kwargs["linearEncoded"])
                #Decoder portion of autoencoder
                self.dec1 = nn.Linear(in_features = kwargs["linearEncoded"], out_features = kwargs["intoLinear"])
                self.dec2 = nn.ConvTranspose2d(in_channels = kwargs["channelsEncoderInner"], out_channels = kwargs["channelsDecoderMid"], kernel_size = kwargs["kernelSize"])
                self.dec3 = nn.ConvTranspose2d(in_channels = kwargs["channelsDecoderMid"], out_channels = kwargs["inputChannels"], kernel_size = kwargs["kernelSize"])

        def forward(self,x):
                #Encoding
                x = F.relu(self.enc1(x))
                x = F.relu(self.enc2(x))
                x = x.reshape(1,-1)
                x = x.squeeze()
                x = F.relu(self.enc3(x))
                #Decoding
                x = F.relu(self.dec1(x))
                x = x.reshape([32,4,28,28])
                x = F.relu(self.dec2(x))
                x = F.relu(self.dec3(x))

                return x

def encodeDecodeConv(numEpochs = 20, input_Channels = 3, batchSize = 32, 
channels_Encoder_Inner = 4, channels_Encoder_Mid = 8, into_Linear = 100352, 
linear_Encoded = 256,  channels_Decoder_Mid = 8, kernel_Size = 3, 
learningRate = 1e-3):
        #Pick a device. If GPU available, use that. Otherwise, use CPU.
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        #Define data transforms
        transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
        #Define training dataset
        trainSet = datasets.CIFAR10(root = './data', train = True, download = True, transform = transform)
        #Define testing dataset
        testSet = datasets.CIFAR10(root = './data', train = False, download = True, transform = transform)
        #Define data loaders
        trainLoader = DataLoader(trainSet, batch_size = batchSize, shuffle = True)
        testLoader = DataLoader(testSet, batch_size = batchSize, shuffle = True)
        #Initialize neural network
        model = convAutoEncoder(inputChannels = input_Channels, channelsEncoderMid = channels_Encoder_Mid, channelsEncoderInner = channels_Encoder_Inner, intoLinear = into_Linear, linearEncoded = linear_Encoded, channelsDecoderMid = channels_Decoder_Mid, kernelSize = kernel_Size)
        #Optimization setup
        criterion = nn.MSELoss()
        optimizer = optim.Adam(model.parameters(),lr = learningRate)
        lossTracker = []
        for epoch in range(numEpochs):
                loss = 0
                for data,_ in trainLoader:
                        data = data.to(device)
                        optimizer.zero_grad()
                        outputs = model(data)
                        train_loss = criterion(outputs,data)
                        train_loss.backward()
                        optimizer.step()
                        loss += train_loss.item()
                loss = loss/len(trainLoader)
                print('Epoch {} of {}, Train loss: {:.3f}'.format(epoch+1,numEpochs,loss))


encodeDecodeConv()

Edit2: Somewhere in the CIFAR10 dataset, the data appears to change dimension. After playing around with print statements more, I discovered that setting the relevant hyperparameter to 100352 works great for many entries, but then seemingly one image pops up that has a different size. Not sure why that would occur, though.

Bryce
  • 1
  • 1
  • Can you please provide a minimal, reproducible example https://stackoverflow.com/help/minimal-reproducible-example. – sgillen Aug 27 '20 at 21:38
  • Added. Terribly silly of me to not include one. Thanks for the reminder. :) – Bryce Aug 27 '20 at 21:58

0 Answers0