2

What is the interpretation of prelu weights, if weights of prelu in a layer are close to 1, and in some other layer they are close 0?

Not much prelu literature around, any help would be really helpful!

Maxim
  • 52,561
  • 27
  • 155
  • 209
Ryan
  • 8,459
  • 14
  • 40
  • 66

1 Answers1

3

PRelu formula is this:

prelu

As you can see, if a is learned to be around 0, then f(x) is almost equal to ordinary relu, and the gradient from the negative activations doesn't change the network. Put it simply, the network doesn't "want" to tweak inactive neurons in any direction. Practically, this also means that you can probably speed up the training by using relu in this layer. Also this non-linearity really matters.

On the contrary, when a is approximately 1, f(x) is almost x, i.e., it's like there is no non-linearity. This means that this layer is probably redundant and the network has enough freedom to make a decision boundary without it.

Maxim
  • 52,561
  • 27
  • 155
  • 209