In the variational auto encoder architecture we were using a mu and sigma fully connected layer. These are then followed by the latent variable layer, that samples from the Gaussian distribution, that is, from mu and sigma of the layer below. And in the cost function we use the KL divergence to ensure that the activation of the latent variable follows a unit gaussian distribution.
Therefore, since SELU is designed to ensure a unit gaussian activation, can we remove the mu and sigma fully connected layer and use the SELU activation function instead.
Thanks!!