Which Activation function should I use in the layer which is just previous to Final layer in Deep Neural Network?

Question

I have a data set with single label multiclass.MNIST Dataset . I want to build the Deep Neural Network classifier on that Dataset. It is obvious that the activation function on last layer will be Softmax. But I am very curious which activation function(Relu, Sigmoid, tanh) should I use previous to last layer. Also please give a intuition behind that.

score 0 · Answer 1 · edited Feb 11 '21 at 08:05

You can use any of the above three you stated and more. Although as a matter of fact, 'ReLu' is faster to compute than the other two and its derivative is faster to compute. This makes a significant difference to training and inference time for neural networks: only a constant factor, but constants can matter. However, the main reason for preferring ReLu over others generally is that it is less susceptible to the vanishing gradient problem.

Which Activation function should I use in the layer which is just previous to Final layer in Deep Neural Network?

1 Answers1