0

In the variational auto encoder architecture we were using a mu and sigma fully connected layer. These are then followed by the latent variable layer, that samples from the Gaussian distribution, that is, from mu and sigma of the layer below. And in the cost function we use the KL divergence to ensure that the activation of the latent variable follows a unit gaussian distribution.

Therefore, since SELU is designed to ensure a unit gaussian activation, can we remove the mu and sigma fully connected layer and use the SELU activation function instead.

Thanks!!

I. A
  • 2,252
  • 26
  • 65

1 Answers1

0

The activation function you use does not matter in here, because the last layer in your encoder network should not have any activation. So even though SELU converges to a unit Gaussian distribution, it will not be a unit Gaussian if you do not have activation on the layer which generates the latent variables.

In addition, the output of encoder network is mean and variance of latent variable distribution, instead of latent variable itself. Therefore, we would like the first column of encoder output to be 0, and the second column of encoder output to be 1, instead of letting them to be unit Gaussian. The SELU will let the distribution of mean and variance to follow unit Gaussian, which does not make sense here.

DiveIntoML
  • 2,347
  • 2
  • 20
  • 36