0

I am training a complex neural network architecture where I use a RNN for encoding my inputs then, A deep neural network with a softmax output layer.

I am now optimizing my architecture deep neural network part (number of units and number of hidden layers).

I am currently using sigmoid activation for all the layers. This seems to be ok for few hidden layer but as the number of layers grow, it seems that sigmoid is not the best choice.

Do you think I should do hyper-parameter optimization for sigmoid first then ReLu or, it is better to just use ReLu directly ?

Also, do you think that having Relu in the first hidden layers and sigmoid only in the last hidden layer makes sense given that I have a softmax output.

ryuzakinho
  • 1,891
  • 3
  • 21
  • 35

1 Answers1

0

You can't optimize hyperparameters independently, no. Just because the optimal solution in the end happens to be X layers and Y nodes, doesn't mean that this will be true for all activation functions, regulazation strategies, learning rates, etc. This is what makes optimizing parameters tricky. That is also why there are libraries for hyperparameter optimization. I'd suggest you start out by reading up on the concept of 'random search optimization'.

5Ke
  • 1,209
  • 11
  • 28
  • Thanks. Actually, I am using particle swarm optimization for my search. I just wanted some intuition to narrow down the search space. – ryuzakinho Jun 27 '17 at 11:45