I have a data set with single label multiclass.MNIST Dataset . I want to build the Deep Neural Network classifier on that Dataset. It is obvious that the activation function on last layer will be Softmax. But I am very curious which activation function(Relu, Sigmoid, tanh) should I use previous to last layer. Also please give a intuition behind that.
Asked
Active
Viewed 34 times
1 Answers
0
You can use any of the above three you stated and more. Although as a matter of fact, 'ReLu' is faster to compute than the other two and its derivative is faster to compute. This makes a significant difference to training and inference time for neural networks: only a constant factor, but constants can matter. However, the main reason for preferring ReLu over others generally is that it is less susceptible to the vanishing gradient problem.