-1

I am a programming enthusiast so please excuse me and help fill any gaps.. From what i understand good results from a neural network require the sigmoid and either learn rate or step rate (depending on training method) to be set correctly along with learning iterations.

While there is a lot of education about these values and the principal of generalization and avoiding an over fit, there doesn't seem to be much focus on their relationship with the data and network.

I've noticed that the number of samples, neurons and inputs seem to scale where these settings best land. (more or less inputs may change the iterations req for example).

Is there a mathematical way to find a good (approximate) starting point for sigmoid, learn rate, steps, iterations and the like based on known values such as samples, inputs, outputs, layers etc?

Catnaps909
  • 117
  • 4
  • 1
    If your model is back-propagation, I can tell you that you only need 1 hidden layer. With respect to the other things you ask about, I'm sure the correct parameters are dependent upon (and tuned to) the input and output spaces. At least that's what I remember from the literature 10 years ago. – Robinson Apr 17 '15 at 09:53
  • Even if your model uses back-propagation, you might want to use more than one hidden layer. While neural networks with one hidden layer are universal approximators, more layers can be beneficial depending on which activation function you use, see deep learning. – Cesar Apr 17 '15 at 17:51

1 Answers1

2

Before the deep learning explosion, one common way to determine the best number of parameters in your network was to use Bayesian regularization. Bayesian regularization is a method to avoid overfitting even if your network is larger than necessary.

Regarding the learning/step rate, the problem is that choosing a small step rate can make learning notoriously slow, while a large step rate may make your network diverge. Thus, a common technique was to use a learning method that could automatically adjust the learning rate in order to accelerate when necessary and decelerate in certain regions of the gradient.

As such, a common way to learn neural networks while taking care of both problems was to use the Levenberg-Marquardt learning algorithm with Bayesian Regularization. The Levenberg-Marquardt algorithm is an adaptive algorithm in the sense that it can adjust the learning rate after every iteration, being able to switch from Gauss-Newton updates (using second order information) back to a Gradient Descent algorithm (using only first order information) as needed.

It can also give you an estimate on the number of parameters that you really need in your network. The number of parameters is the total number of weights considering all neurons in the network. You can then use this parameter to estimate how many neurons you should be using in the first place.

This method is implemented by the MATLAB function trainbr. However, since you also included the accord-net tag, I should also say that it is implemented by the LevenbergMarquardtLearning class (you might want to use the latest alpha version in NuGet in case you are dealing with multiple output problems).

Cesar
  • 2,059
  • 25
  • 30
  • Thanks Cesar for taking the time to answer, your library and dedication to it is greatly appreciated. I assume by parameters you mean the EffectiveParameters? – Catnaps909 Apr 17 '15 at 19:11
  • Yes, this property should give the approximate number of parameters being effectively used by the network - by the way, thanks! I hope the framework can be useful for you! – Cesar Apr 17 '15 at 20:36