Multiple Activation Functions for multiple Layers (Neural Networks)

Question

I have a binary classification problem for my neural network.

I already got good results using the ReLU activation function in my hidden layer and the sigmoid function in the output layer. Now I'm trying to get even better results. I added a second hidden layer with the ReLU activation function, and the results got even better. I tried to use the leaky ReLU function for the second hidden layer instead of the ReLU function and got even better results, but I'm not sure if this is even allowed.

So I have something like that: Hidden layer 1: ReLU activation function Hidden layer 2: leaky ReLU activation function Hidden layer 3: sigmoid activation function

I can't find many resources on it, and those I found always use the same activation function on all hidden layers.

It is fine to use it like that. I think we use the same activation layer because of its simplicity. As long as it works better for you, go for it! Same question asked here: https://stackoverflow.com/a/37947823/8293176 — NanoBit, May 02 '21 at 11:56

score 1 · Accepted Answer · answered May 02 '21 at 12:28

If you mean the Leaky ReLU, I can say that, in fact, the Parametric ReLU (PReLU) is the activation function that generalizes the tradional rectified unit as well as the leaky ReLU. And yes, PReLU impoves model fitting with no significant extra computational cost and little overfitting risk.

For more details, you can check out this link Delving Deep into Rectifiers

Multiple Activation Functions for multiple Layers (Neural Networks)

1 Answers1