0

I am currently training several hundred different permutations of neural networks. Using Levenberg-Marquardt backpropogation yields results relatively fast, however I prefer if I use gradient descent for now for academic reasons. Unfortunately, gradient descent is very slow to the point that I simply stop it because it will take too long to train all of the networks.

Are there ways to speed up the gradient descent process, preferably not involving parallel computing techniques?

mesllo
  • 545
  • 7
  • 29
  • What variant of gradient descent are you using? Stochastic(online), batch, or mini-batch? Are you using any additions such as momentum? There are a few techniques to improve gradient descent learning performance, some of them change the maths (how fast the gradient is followed,e.g. independently for each weight), so can tend to arrive at slightly different end points. – Neil Slater Mar 01 '15 at 20:04
  • As far as I know its batch. I simply specify 'traingd' which invokes the batch gradient descent backpropogation algorithm as the training process for the nn. More info [here](http://www.mathworks.com/help/nnet/ref/traingd.html). I am not using any additional techniques, because then I am not sure if I would realise if the training happened correctly. However I'm also provided with a visualization of the curve being fitted and I think it would be a good indication I suppose. – mesllo Mar 01 '15 at 23:48
  • Looks like about the only thing you can try is to adjust the learning rate upwards. That may not affect the many-network comparison you are making, even if it prevents some networks reaching a better optimised solution. Try it out on a small subset of networks, to see if it helps speed without affecting convergence. Training of NNs can be very slow, and there are many competing techniques and active research into making it faster - implementing them requires either that your library already implements them, or that you are able to adjust the lower-level code. – Neil Slater Mar 02 '15 at 06:39
  • Could you expand on "academic reasons" in your question? If you don't want to even consider e.g. momentum, then you are going to be limited. However, if this is about measuring convergence, then you may not need to exclude the more advanced techniques, just need help in understanding how to measure the end result to satisfy your "academic reasons". – Neil Slater Mar 02 '15 at 06:45
  • My aim here is to approximate a number of cosine functions with varying parameters on a number of different structured neural networks. Then the program determines for each parameter which is its best network structure. The reason why I am trying to use gradient descent instead is because it will be better when it comes to writing and explaining it for my review (I'm a student) and would rather not go into LM and end up with incorrect theory (since I already covered gradient descent before). I actually would not mind considering momentum, changing learning rate, etc. – mesllo Mar 02 '15 at 15:23
  • I suppose in the end I try tweaking the momentum and learning rate around and see which helps the most, but if not I will use LM instead and review it as best as I can. The end result would be based on the mean squared error (its already measured by the Matlab NN toolbox), other than that I'm unsure what else I can add to the results. – mesllo Mar 02 '15 at 15:25

0 Answers0