Can auto-encoder encode new vector without re-training afresh?

Question

Here is a simple autoencoder to encode 3 vectors of dimension 1x3 : [1,2,3],[1,2,3],[100,200,500] to 1x1 :

epochs = 1000
from pylab import plt
plt.style.use('seaborn')
import torch.utils.data as data_utils
import torch
import torchvision
import torch.nn as nn
from torch.autograd import Variable

cuda = torch.cuda.is_available()
FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
import numpy as np
import pandas as pd
import datetime as dt


features = torch.tensor(np.array([ [1,2,3],[1,2,3],[100,200,500] ]))

print(features)

batch = 1
data_loader = torch.utils.data.DataLoader(features, batch_size=2, shuffle=False)

encoder = nn.Sequential(nn.Linear(3,batch), nn.Sigmoid())
decoder = nn.Sequential(nn.Linear(batch,3), nn.Sigmoid())
autoencoder = nn.Sequential(encoder, decoder)

optimizer = torch.optim.Adam(params=autoencoder.parameters(), lr=0.001)

encoded_images = []
for i in range(epochs):
    for j, images in enumerate(data_loader):
    #     images = images.view(images.size(0), -1) 
        images = Variable(images).type(FloatTensor)
        optimizer.zero_grad()
        reconstructions = autoencoder(images)
        loss = torch.dist(images, reconstructions)
        loss.backward()
        optimizer.step()

#     encoded_images.append(encoder(images))

# print(decoder(torch.tensor(np.array([1,2,3])).type(FloatTensor)))

encoded_images = []
for j, images in enumerate(data_loader):
    images = images.view(images.size(0), -1) 
    images = Variable(images).type(FloatTensor)

    encoded_images.append(encoder(images))

The variable encoded_images is an array of size 3 where each array entry represents the reduced dimensionality of a feature array :

[tensor([[0.9972],
         [0.9972]], grad_fn=<SigmoidBackward>),
 tensor([[1.]], grad_fn=<SigmoidBackward>)]

In order to determine similarity of a new feature, for example [1,1,1] is it required to re-train the network or can the existing trained network configuration/weights be "bootstrapped" such that the new vector can be encoded without require to retrain the network afresh ?

score 2 · Answer 1 · edited Apr 26 '20 at 13:16

Sorry but your code is a mess... And if it's just to showcase the autoencoder idea (here you just have X, Y, Z coordinates while you name it image) it's chosen pretty poorly.

Out of the way: If it's an image you won't be able to encode it as a single pixel, this needs a little more sophistication.

Source code

Here is a simple autoencoder to encode 3 vectors of dimension 1x3 : [1,2,3],[1,2,3],[100,200,500] to 1x1

Which is true only in this case as you have batch of 3 elements (while you named batch out_features of the network!). Their dimensions are not 1x3, it's just 3 as well. Here is a Minimal Reproducible Example with commentary:

import torch

# Rows are batches, there could be 3, there could be a thousand
data = torch.tensor([[1, 2, 3], [1, 2, 3], [100, 200, 500]]).float()

# 3 input features, columns of data
encoder = torch.nn.Sequential(torch.nn.Linear(3, 1), torch.nn.Sigmoid())
decoder = torch.nn.Sequential(torch.nn.Linear(1, 3), torch.nn.Sigmoid())

autoencoder = torch.nn.Sequential(encoder, decoder)

optimizer = torch.optim.Adam(params=autoencoder.parameters(), lr=0.001)

epochs = 10000

for i in range(epochs):
    optimizer.zero_grad()
    reconstructions = autoencoder(data)
    loss = torch.dist(data, reconstructions)
    loss.backward()
    optimizer.step()
    # Print loss every 100 epoochs
    if i % 100 == 0:
        print(loss)

Will it work?

This one is more interesting. In principle, if your neural network is trained, you don't have to retrain it to include example it didn't previously see (as the goal of neural network is to learn some patterns to solve the task).

In your case it won't.

Why it won't work?

First of all, you have sigmoid activation in decoder which restricts output to [0, 1] range. You are trying to predict data which is outside this range so it's impossible.

Without running, I can tell you what's the loss of this example will go towards to (with all weights being +inf). All predictions will be always [1, 1, 1] (or as close to it as possible) as this value penalizes the network the least, so you just have to calculate distance of each vector in data to [1, 1, 1]. Here loss is stuck around 546.2719. Weights and biases are around 10 (which is pretty huge for sigmoid) after 100000 epochs. Your values may vary but the trend is clear (though it will stop, as 10 is pretty close to 1 when you squash it with sigmoid).

Removing `torch.nn.Sigmoid` from `decoder`

What if we remove torch.nn.Sigmoid() from decoder? It will learn to almost perfectly reconstruct just your 3 examples, with loss being 0.002 after "only" 500000 epochs:

Here are the learned weights of decoder:

tensor([[ 99.0000],
        [198.0000],
        [496.9999]], requires_grad=True)

And here is the bias:

tensor([1.0000, 2.0000, 2.9999])

And here is the output of encoder for each example:

tensor([[2.2822e-13],
        [2.2822e-13],
        [1.0000e+00]])

Analysis of results

Your network learned just what you told it to learn, which is... magnitude (+ clever bias hackery).

[1, 2, 3] vector

Take [1, 2, 3] example (repeated twice). encoding of it is 2e-13 and goes towards zero, so we will assume it's zero.

Now, multiply 0 with all the weights, you still get zero. Add bias which is [1.0, 2.0, 2.99999] and you magically got your input reconstructed.

[100, 200, 500] vector

You can probably see where it's going.

Encoded value is 1.0, when multiplied by decoder weights we get [99.0, 198.0, 497.0]. Add bias to it and voila, we get our [100.0, 200.0, 500.0].

[1, 1, 1] vector

In your case it obviously will not work as magnitude of [1, 1, 1] is really small, hence it will be encoded as zero and reconstruced as [1, 2, 3].

Removing `torch.nn.Sigmoid` from `encoder`

A little off-topic, but when you remove sigmoid from encoder it won't be able to learn this pattern as "easily". The reason is the network has to be more conservative with the weights (as those won't be squashed). You would have to drop the learning rate (preferably constantly lowering it as the training progresses) as it becomes unstable at some point (when trying to hit "the perfect spot").

Learning similarity

It's hard (at least for the network) to define "similar" in this case. Is [1, 2, 3] similar to [3, 2, 1]? It has no concept of different dimensions and is required to squash those three numbers into a single value (later used for reconstruction).

As demonstrated, it would probably learn some implicit patterns in your data to be good at reconstructing "at least something", but won't find the general pattern you are looking for. Still it depends on your data and it's properties but I would argue for no in general and I think it's generalization capabilities would be poor.

And as you've seen in the analysis above, neural network is pretty good at finding those patterns even when you didn't see them (or maybe you did and that's what you were after?) or they don't exist at all.

If you need dimension similarity (and it's not just a thought experiment), you have a lot of "human-made" stuff like the p-norm, some encodings (those also measure similiarity but in a different way) so it's better to go for that IMO.

score 0 · Answer 2 · answered Apr 22 '20 at 07:31

Your test vector/feature [1,1,1] is same dimension as input. And hence can be passed through the neural network to get the representation/encoding. Therefore it is not required to make any changes to parameters or network configuration.

Will it work faithfully?

This is a very interesting question and the answer is: "it depends". It is hard to make any guarantees about learning a general similarity function from just three inputs.

You would want to test your model on data that is similar in distribution to training i.e. training and Test data should be very similar. In the above case, while you can use the trained model, it is hard to argue that it will work correctly.