0

I'm currently trying to use an autoencoder network for dimensionality reduction. (i.e. using the bottleneck activation as the compressed feature)

I noticed that a lot of studies that used autoencoder for this task uses a linear bottleneck layer.

By intuition, I think this makes sense since the usage of non-linear activation function may reduce the bottleneck feature's capability to represent the principle information contained within the original feature. (e.g., ReLU ignores the negative values and sigmoid suppresses values too high or too low)

However, is this correct? And is using linear bottleneck layer for autoencoder necessary?

If it's possible to use a non-linear bootleneck layer, what activation function would be the best choice?

Thanks.

whkang
  • 360
  • 2
  • 10

1 Answers1

2

No, you are not limited to linear activation functions. An example of that is this work, where they use the hidden state of the GRU layers as an embedding for the input. The hidden state is obtained by using non-linear tanh and sigmoid functions in its computation.

Also, there is nothing wrong with 'ignoring' the negative values. The sparsity may, in fact, be beneficial. It can enhance the representation. The noise that can be created by other functions such as identity or sigmoid function may introduce false dependencies where there are none. By using ReLU we can represent the lack of dependency properly (as a zero) as opposed to some near zero value which is likely for e.g. sigmoid function.

Aechlys
  • 1,286
  • 7
  • 16