4

I am trying to use leaky_relu as my activation function for hidden layers. For parameter alpha, it is explained as:

slope of the activation function at x < 0

What does this means? What effect will the different values of alpha have on the results of the model?

David
  • 8,113
  • 2
  • 17
  • 36
Anthony0202
  • 43
  • 1
  • 4

1 Answers1

4

A deep explanation regarding ReLU and its variant is present in the following links:

  1. https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
  2. https://medium.com/@himanshuxd/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e

In regular ReLU the main drawback is the fact that the input for the activation can be negative, due to operation performed in the network causing to what is referred as "Dying RELU" problem

the gradient is 0 whenever the unit is not active. This could lead to cases where a unit never activates as a gradient-based optimization algorithm will not adjust the weights of a unit that never activates initially. Further, like the vanishing gradients problem, we might expect learning to be slow when training ReLU networks with constant 0 gradients.

So Leaky ReLU substitutes zero values with some small value say 0.001 (referred as “alpha”). So, for leaky ReLU, the function f(x) = max(0.001x, x). Now gradient descent of 0.001x will be having a non-zero value and it will continue learning without reaching dead end.

David
  • 8,113
  • 2
  • 17
  • 36
  • 4
    A tiny quibble with this answer: The suggested alpha 0.001 is much smaller than is referenced elsewhere. The default values in Tensorflow and Keras are 0.2 and 0.3 respectively. In my informal survey on this topic, I have seen references that go as low as 0.01, but nothing smaller. While in theory, any non-zero value will prevent the dying ReLU problem, in practice if the alpha is too close to zero it will result in gradients that may not be strong enough to "revive" a nearly dead unit. – 4dan Jan 24 '21 at 21:52
  • @4dan Thanks for the input, although there was no particular reason for the value I choose just as an example. – David Jan 25 '21 at 05:23