I'm currently trying to use an autoencoder network for dimensionality reduction. (i.e. using the bottleneck activation as the compressed feature)
I noticed that a lot of studies that used autoencoder for this task uses a linear bottleneck layer.
By intuition, I think this makes sense since the usage of non-linear activation function may reduce the bottleneck feature's capability to represent the principle information contained within the original feature. (e.g., ReLU ignores the negative values and sigmoid suppresses values too high or too low)
However, is this correct? And is using linear bottleneck layer for autoencoder necessary?
If it's possible to use a non-linear bootleneck layer, what activation function would be the best choice?
Thanks.