Why is the decoder in an autoencoder uses a sigmoid on the last layer?

Question

I am looking at this working variational auto encoder.

The main class

class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()

        self.fc1 = nn.Linear(784, 400)
        self.fc21 = nn.Linear(400, 20)
        self.fc22 = nn.Linear(400, 20)
        self.fc3 = nn.Linear(20, 400)
        self.fc4 = nn.Linear(400, 784)

    def encode(self, x):
        h1 = F.relu(self.fc1(x))
        return self.fc21(h1), self.fc22(h1)

    def reparametrize(self, mu, logvar):
        std = logvar.mul(0.5).exp_()
        if torch.cuda.is_available():
            eps = torch.cuda.FloatTensor(std.size()).normal_()
        else:
            eps = torch.FloatTensor(std.size()).normal_()
        eps = Variable(eps)
        return eps.mul(std).add_(mu)

    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return F.sigmoid(self.fc4(h3))

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparametrize(mu, logvar)
        return self.decode(z), mu, logvar

has

    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return F.sigmoid(self.fc4(h3))

I can't explain to myself why the last layer should be passed through a sigmoid before returning.

Please explain.

EDIT: I just checked without the sigmoid. Results are still nice. Now I am not sure if it is needed or not.

score 8 · Accepted Answer · answered Dec 15 '20 at 14:55

8

As mentioned in the answer by Jim J, sigmoid forces the output to the range [0, 1]. In this case, it's not because we want to interpret the output as a probability, rather it's done to force the output to be interpreted as pixel intensity of a grey scale image.

If you remove the sigmoid, the NN will have to learn that all the outputs should be in the range [0, 1]. The sigmoid might help making the learning process more stable.

answered Dec 15 '20 at 14:55

Sandro H

138
4

Can we replace sigmoid with softmax function? What will be the impact ? – MSS Jul 16 '21 at 19:10
@MSS Softmax takes many inputs and has a single output. Sigmoid has a single input and a single output. So no, they are not interchangable. The private case of softmax with a single input is a sigmoid, but I wouldn't say that they can be replaced. – Gulzar Jul 17 '21 at 08:17
1

Okay but I still can't explain it to myself. – MSS Jul 17 '21 at 13:43

score 1 · Answer 2 · answered Dec 15 '20 at 14:29

1

If I remember correctly, it'll convert the results into a probability, expressed as a real number between 0 and 1.

answered Dec 15 '20 at 14:29

Jim J

546
3
11

1

you do remember correctly, but that doesn't make sense here. The output should be the same as the input, as this is an autoencoder. The input is not probabilities. – Gulzar Dec 15 '20 at 14:31
thats not quite correct, the values do not represent the probability, it just squishes them between 0 and 1. The softmax function returns probabilies which would make no sense here. Simoid makes sense, because the input (all pixels of mnist) are in range 0 and 1 sothe output has to be between 0 and 1, like described in @Sandro H 's answer – Theodor Peifer Dec 15 '20 at 16:58

score 0 · Answer 3 · answered Jan 10 '22 at 16:21

This is because the images get from

train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.ToTensor()),
    batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.ToTensor()),
    batch_size=args.batch_size, shuffle=True, **kwargs)

have pixel value range [0,1], you can add print('data[0]: ', data[0]) here；

def test(epoch):
    model.eval()
    test_loss = 0
    with torch.no_grad():
        for i, (data, _) in enumerate(test_loader):
            data = data.to(device)
            print('data[0]: ', data[0])
            ...

Look at the print output, you will find those values are range from 0 to 1. By the way, the first argument of torchvision.utils.save_image() also takes tensor whose pixel value range from 0 to 1, because inside that function, it will multiply 255 before save to image.

Why is the decoder in an autoencoder uses a sigmoid on the last layer?

3 Answers3

Linked