Sorry but your code is a mess... And if it's just to showcase the autoencoder idea (here you just have X, Y, Z coordinates while you name it image
) it's chosen pretty poorly.
Out of the way: If it's an image you won't be able to encode it as a single pixel, this needs a little more sophistication.
Source code
Here is a simple autoencoder to encode 3 vectors of dimension 1x3 :
[1,2,3],[1,2,3],[100,200,500] to 1x1
Which is true only in this case as you have batch of 3
elements (while you named batch
out_features
of the network!). Their dimensions are not 1x3
, it's just 3
as well. Here is a Minimal Reproducible Example with commentary:
import torch
# Rows are batches, there could be 3, there could be a thousand
data = torch.tensor([[1, 2, 3], [1, 2, 3], [100, 200, 500]]).float()
# 3 input features, columns of data
encoder = torch.nn.Sequential(torch.nn.Linear(3, 1), torch.nn.Sigmoid())
decoder = torch.nn.Sequential(torch.nn.Linear(1, 3), torch.nn.Sigmoid())
autoencoder = torch.nn.Sequential(encoder, decoder)
optimizer = torch.optim.Adam(params=autoencoder.parameters(), lr=0.001)
epochs = 10000
for i in range(epochs):
optimizer.zero_grad()
reconstructions = autoencoder(data)
loss = torch.dist(data, reconstructions)
loss.backward()
optimizer.step()
# Print loss every 100 epoochs
if i % 100 == 0:
print(loss)
Will it work?
This one is more interesting. In principle, if your neural network is trained, you don't have to retrain it to include example it didn't previously see (as the goal of neural network is to learn some patterns to solve the task).
In your case it won't.
Why it won't work?
First of all, you have sigmoid activation in decoder
which restricts output to [0, 1]
range. You are trying to predict data which is outside this range so it's impossible.
Without running, I can tell you what's the loss of this example will go towards to (with all weights being +inf
). All predictions will be always [1, 1, 1]
(or as close to it as possible) as this value penalizes the network the least, so you just have to calculate distance of each vector
in data to [1, 1, 1]
. Here loss is stuck around 546.2719
. Weights and biases are around 10
(which is pretty huge for sigmoid) after 100000 epochs. Your values may vary but the trend is clear (though it will stop, as 10
is pretty close to 1
when you squash it with sigmoid
).
Removing torch.nn.Sigmoid
from decoder
What if we remove torch.nn.Sigmoid()
from decoder
? It will learn to almost perfectly reconstruct just your 3 examples, with loss being 0.002
after "only" 500000
epochs:
Here are the learned weights of decoder
:
tensor([[ 99.0000],
[198.0000],
[496.9999]], requires_grad=True)
And here is the bias
:
tensor([1.0000, 2.0000, 2.9999])
And here is the output of encoder
for each example:
tensor([[2.2822e-13],
[2.2822e-13],
[1.0000e+00]])
Analysis of results
Your network learned just what you told it to learn, which is... magnitude (+ clever bias
hackery).
[1, 2, 3] vector
Take [1, 2, 3]
example (repeated twice). encoding of it is 2e-13
and goes towards zero, so we will assume it's zero.
Now, multiply 0
with all the weights, you still get zero. Add bias
which is [1.0, 2.0, 2.99999]
and you magically got your input reconstructed.
[100, 200, 500] vector
You can probably see where it's going.
Encoded value is 1.0
, when multiplied by decoder
weights we get [99.0, 198.0, 497.0]
. Add bias
to it and voila, we get our [100.0, 200.0, 500.0]
.
[1, 1, 1] vector
In your case it obviously will not work as magnitude of [1, 1, 1]
is really small, hence it will be encoded as zero
and reconstruced as [1, 2, 3]
.
Removing torch.nn.Sigmoid
from encoder
A little off-topic, but when you remove sigmoid from encoder it won't be able to learn this pattern as "easily". The reason is the network has to be more conservative with the weights (as those won't be squashed). You would have to drop the learning rate (preferably constantly lowering it as the training progresses) as it becomes unstable at some point (when trying to hit "the perfect spot").
Learning similarity
It's hard (at least for the network) to define "similar" in this case. Is [1, 2, 3]
similar to [3, 2, 1]
? It has no concept of different dimensions and is required to squash those three numbers into a single value (later used for reconstruction).
As demonstrated, it would probably learn some implicit patterns in your data to be good at reconstructing "at least something", but won't find the general pattern you are looking for. Still it depends on your data and it's properties but I would argue for no in general and I think it's generalization capabilities would be poor.
And as you've seen in the analysis above, neural network is pretty good at finding those patterns even when you didn't see them (or maybe you did and that's what you were after?) or they don't exist at all.
If you need dimension similarity (and it's not just a thought experiment), you have a lot of "human-made" stuff like the p-norm
, some encodings (those also measure similiarity but in a different way) so it's better to go for that IMO.