I want to train a neural network to perform signal classification.
The network has 50 inputs of the format: [-1 .. 1]
50 hidden layers (not restricted)
10 outputs
hyperbolic tangent (not restricted)
I am restricted to library (hnn) to do the training.
My problem is that I do not know what is the appropriate learning rate and the number of training iterations
I have tried many possible settings in the range:
[1K - 10K] training iterations
[0.001 - 1.5] learning rate
But when I feed my training data again into the trained neural network, I get very bad results (in the form of confusion matrix) - at most 2 classes classified correctly.
What is the appropriate set of these two parameters for the input data?
While searching for the similar cases in literature, I discovered that different cases use different parameter setting without really explaining the reasoning.
Experiments: Mentioned library has a function trainUntilErrorBelow (self-explanatory). I have used this function to see how fast I can reach a certain error by changing activation function and the number of hidden layers.
I have chosen the following:
minimum error: 300
learning rate: 0.01
Results: Hyperbolic tangent:
1 hidden layer (50 neurons) - 32.12 sec
2 hidden layers (50/50 neurons) - 31.51 sec
3 hidden layers (50/50/50 neurons) - 12.18 sec
4 hidden layers (50/50/50/50 neurons) - 42.28 sec
Sigmoid:
1 hidden layer (50 neurons) - 21.32 sec
2 hidden layers (50/50 neurons) - 274.29 sec
3 hidden layers (50/50/50 neurons) - ∞ sec
4 hidden layers (50/50/50/50 neurons) - ∞ sec
Is it reasonable to assume that the hyperbolic tangent activation function with 3 hidden layers (50/50/50 neurons) is a good choice for the network architecture?