-2

I have a data set with single label multiclass.MNIST Dataset . I want to build the Deep Neural Network classifier on that Dataset. It is obvious that the activation function on last layer will be Softmax. But I am very curious which activation function(Relu, Sigmoid, tanh) should I use previous to last layer. Also please give a intuition behind that.

1 Answers1

0

You can use any of the above three you stated and more. Although as a matter of fact, 'ReLu' is faster to compute than the other two and its derivative is faster to compute. This makes a significant difference to training and inference time for neural networks: only a constant factor, but constants can matter. However, the main reason for preferring ReLu over others generally is that it is less susceptible to the vanishing gradient problem.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Mrityu
  • 416
  • 8
  • 14