2

Here is a autoencoder trained on mnist using PyTorch :

import torch
import torchvision
import torch.nn as nn
from torch.autograd import Variable

cuda = torch.cuda.is_available() # True if cuda is available, False otherwise
FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
print('Training on %s' % ('GPU' if cuda else 'CPU'))

# Loading the MNIST data set
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
                torchvision.transforms.Normalize((0.1307,), (0.3081,))])
mnist = torchvision.datasets.MNIST(root='../data/', train=True, transform=transform, download=True)

# Loader to feed the data batch by batch during training.
batch = 100
data_loader = torch.utils.data.DataLoader(mnist, batch_size=batch, shuffle=True)

autoencoder = nn.Sequential(
                # Encoder
                nn.Linear(28 * 28, 512),
                nn.PReLU(512),
                nn.BatchNorm1d(512),

                # Low-dimensional representation
                nn.Linear(512, 128),   
                nn.PReLU(128),
                nn.BatchNorm1d(128),

                # Decoder
                nn.Linear(128, 512),
                nn.PReLU(512),
                nn.BatchNorm1d(512),
                nn.Linear(512, 28 * 28))

autoencoder = autoencoder.type(FloatTensor)

optimizer = torch.optim.Adam(params=autoencoder.parameters(), lr=0.005)

epochs = 10
data_size = int(mnist.train_labels.size()[0])

for i in range(epochs):
    for j, (images, _) in enumerate(data_loader):
        images = images.view(images.size(0), -1) # from (batch 1, 28, 28) to (batch, 28, 28)
        images = Variable(images).type(FloatTensor)

        autoencoder.zero_grad()
        reconstructions = autoencoder(images)
        loss = torch.dist(images, reconstructions)
        loss.backward()
        optimizer.step()
    print('Epoch %i/%i loss %.2f' % (i + 1, epochs, loss.data[0]))

print('Optimization finished.')

I'm attempting to compare the lower dimensionality representation of each image.

Printing the dimensionality of each layer :

for l in autoencoder.parameters() : 
    print(l.shape)

displays :

torch.Size([512, 784])
torch.Size([512])
torch.Size([512])
torch.Size([512])
torch.Size([512])
torch.Size([128, 512])
torch.Size([128])
torch.Size([128])
torch.Size([128])
torch.Size([128])
torch.Size([512, 128])
torch.Size([512])
torch.Size([512])
torch.Size([512])
torch.Size([512])
torch.Size([784, 512])
torch.Size([784])

So appears the dimensionality is not stored in learned vectors ?

In other words if I have 10000 images each containing 100 pixels, executing above autoencoder that reduces dimensionality to 10 pixels should allow to access the 10 pixel dimensionality of all 10000 images ?

blue-sky
  • 51,962
  • 152
  • 427
  • 752
  • You reduce the 28x28 image to a reduced 128 dimensional space. And you print out tensors of size 128. So I don't understand why you say that the encoded image is not among the printed tensors? – user2653663 Jul 25 '18 at 10:18
  • @user2653663 do i have control – blue-sky Jul 25 '18 at 11:22
  • @user2653663 for a single image how to access it's reduced dimensionality ? As encoder and decoder dimensionally are same then in order to access reduced dimensionality need to access the weights of hidden layer ? – blue-sky Jul 25 '18 at 11:31

1 Answers1

1

I'm not very familiar with pyTorch, but splitting the autoencoder into an encoder and decoder model seems to work (I changed the size of the hidden layer from 512 to 64, and the dimension of the encoded image from 128 to 4, to make the example run faster):

import torch
import torchvision
import torch.nn as nn
from torch.autograd import Variable

cuda = torch.cuda.is_available() # True if cuda is available, False otherwise
FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
print('Training on %s' % ('GPU' if cuda else 'CPU'))

# Loading the MNIST data set
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
                torchvision.transforms.Normalize((0.1307,), (0.3081,))])
mnist = torchvision.datasets.MNIST(root='../data/', train=True, transform=transform, download=True)

# Loader to feed the data batch by batch during training.
batch = 100
data_loader = torch.utils.data.DataLoader(mnist, batch_size=batch, shuffle=True)


encoder = nn.Sequential(
                # Encoder
                nn.Linear(28 * 28, 64),
                nn.PReLU(64),
                nn.BatchNorm1d(64),

                # Low-dimensional representation
                nn.Linear(64, 4),
                nn.PReLU(4),
                nn.BatchNorm1d(4))

decoder = nn.Sequential(
                # Decoder
                nn.Linear(4, 64),
                nn.PReLU(64),
                nn.BatchNorm1d(64),
                nn.Linear(64, 28 * 28))

autoencoder = nn.Sequential(encoder, decoder)

encoder = encoder.type(FloatTensor)
decoder = decoder.type(FloatTensor)
autoencoder = autoencoder.type(FloatTensor)

optimizer = torch.optim.Adam(params=autoencoder.parameters(), lr=0.005)

epochs = 10
data_size = int(mnist.train_labels.size()[0])

for i in range(epochs):
    for j, (images, _) in enumerate(data_loader):
        images = images.view(images.size(0), -1) # from (batch 1, 28, 28) to (batch, 28, 28)
        images = Variable(images).type(FloatTensor)

        autoencoder.zero_grad()
        reconstructions = autoencoder(images)
        loss = torch.dist(images, reconstructions)
        loss.backward()
        optimizer.step()
    print('Epoch %i/%i loss %.2f' % (i + 1, epochs, loss.data[0]))

print('Optimization finished.')

# Get the encoded images here
encoded_images = []
for j, (images, _) in enumerate(data_loader):
    images = images.view(images.size(0), -1) 
    images = Variable(images).type(FloatTensor)

    encoded_images.append(encoder(images))
user2653663
  • 2,818
  • 1
  • 18
  • 22
  • 1
    thanks for this, where in your code is specified that is accessing layer weights ? I understand the encodings are being stored in list 'encoded_images' but what determines the hidden weights being stored here ? Is reason the last layer in encoder is the 'Low-dimensional representation' so when executing 'encoder(images)' then 'nn.Linear(64, 4)' is whats stored ? – blue-sky Jul 26 '18 at 08:01
  • I'm not sure I understand your question. In the encoder the image (of size 28^2) is first transformed to a 64 dimensional representation. Then the 64 dimensional representation is transformed to a 4 dimensional one (the encoded image). nn.Linear(64, 4) contains the weights required to transform 64 floating point values to 4 floating point values. So the encoder object contains all the network structure and trained weights to transform the image into a 4d encoded image. – user2653663 Jul 26 '18 at 09:22
  • I'm trying to understand how encoder(images) stores reduced feature encodings. Looking at the code as last line in the encoder is : nn.BatchNorm1d(4)) , nn.BatchNorm1d(4)) is invoked by encoder(images) to store the encodings. – blue-sky Jul 26 '18 at 09:56