4

I have a neural network written in standard C++11 which I believe follows the back-propagation algorithm correctly (based on this). If I output the error in each step of the algorithm, however, it seems to oscillate without dampening over time. I've tried removing momentum entirely and choosing a very small learning rate (0.02), but it still oscillates at roughly the same amplitude per network (with each network having a different amplitude within a certain range).

Further, all inputs result in the same output (a problem I found posted here before, although for a different language. The author also mentions that he never got it working.)

The code can be found here.

To summarize how I have implemented the network:

  • Neurons hold the current weights to the neurons ahead of them, previous changes to those weights, and the sum of all inputs.
  • Neurons can have their value (sum of all inputs) accessed, or can output the result of passing said value through a given activation function.
  • NeuronLayers act as Neuron containers and set up the actual connections to the next layer.
  • NeuronLayers can send the actual outputs to the next layer (instead of pulling from the previous).
  • FFNeuralNetworks act as containers for NeuronLayers and manage forward-propagation, error calculation, and back-propagation. They can also simply process inputs.
  • The input layer of an FFNeuralNetwork sends its weighted values (value * weight) to the next layer. Each neuron in each layer afterwards outputs the weighted result of the activation function unless it is a bias, or the layer is the output layer (biases output the weighted value, the output layer simply passes the sum through the activation function).

Have I made a fundamental mistake in the implementation (a misunderstanding of the theory), or is there some simple bug I haven't found yet? If it would be a bug, where might it be?

Why might the error oscillate by the amount it does (around +-(0.2 +- learning rate)) even with a very low learning rate? Why might all the outputs be the same, no matter the input?

I've gone over most of it so much that I might be skipping over something, but I think I may have a plain misunderstanding of the theory.

Community
  • 1
  • 1
Cave Dweller
  • 502
  • 2
  • 4
  • 17
  • If I understand your code correctly, you use predetermined input data to train your neural net. Same data and same calculations should yield same results. But I get different results eveytime I run it. It looks like some randmness is in. May be some uninitialised data ? – Christophe Jul 13 '14 at 22:38
  • @Christophe It uses predetermined input data and expected outputs for training, and the input (and matching output) is randomly selected from. The initial weights are randomized, but beyond those examples nothing else is random. You are correct in that there is a different result each time, and that's what I'm trying to figure out- backpropagation should have the values converge to a given point, but they don't. However, the process of back-propagating isn't random at all, hence my confusion. – Cave Dweller Jul 13 '14 at 22:43
  • 1
    Yes I just noticed. When I comment out all the seedings with srand(), I get a constant output. However I wonder if your multiple seeding using time would not alter the quality of the randomness by resetting the random number. Especially as the time(0) is generally in seconds. – Christophe Jul 13 '14 at 22:47
  • @Christophe Further, if there *were* any uninitialized data, there would almost certainly be a segmentation fault involved, except possibly with the value held in each neuron. However, the default constructor for `Neuron` handles that. I would also be seeing some compiler warnings for uninitialized variables, which I don't see. – Cave Dweller Jul 13 '14 at 22:47
  • I could try removing the seedings in the lower layers and instead have a global (or at least per-network) seeding. Let me see if that helps. – Cave Dweller Jul 13 '14 at 22:48
  • Why do you backPropagate from layerCount-2 to 1? doesn't that give you only a single level of BP? – Leeor Jul 13 '14 at 22:50
  • @CaveDweller I commented out all the multiple srand() and put a single seeding at begin of main(). Now, when I rereun multiple times your programm, I get very close values (between 0.48 and 0,51) when before I got higher fluctuations (0,34 to 0,65). – Christophe Jul 13 '14 at 22:51
  • @Leeor Notice that it adjusts the weights of the layer before based on the error in the current layer. `adjustWeights(layers[i-1],...` – Cave Dweller Jul 13 '14 at 22:53
  • @Christophe Yeah, I did the same except I called srand() in the constructor of the network, right before assigning weights (`connectTo(...)`), and I'm getting similar results. – Cave Dweller Jul 13 '14 at 22:55
  • Yes, that's why you should run as long as `i>0`, but why not start from the last layer? – Leeor Jul 13 '14 at 22:55
  • @Leeor I'm holding all layers in a single array of layers- the last layer is the output layer and has a different error calculation equation. The last hidden layer is updated, and from there all hidden layers before it, as well as the input layer, are updated. – Cave Dweller Jul 13 '14 at 22:57
  • I should also add in that the network at the end is "saved" in a sort of a log file, useful for debugging this small test network, but ultimately for loading up networks later. – Cave Dweller Jul 13 '14 at 22:58
  • @CaveDweller Moved to the right place : Backprop is notoriously hard to implement. Try ensuring that the gradient obtained through backprop is right. Implement an empirical differentiator - given a function f, its derivative at x is limit h->0 (f(x+h)-f(x))/h. (f(x+h) - f(x-h))/2h gives an approximation to f'(x) with error bounded by O(h^2). Note that this is true even in the multi-dimensional setting - with k dimensions, you will repeat the above process k times, obtaining that many partial derivatives. My guess is that this will help you find the error in your backprop implementation. – Pradhan Jul 13 '14 at 23:17

1 Answers1

1

It turns out I was just staring at the FFNeuralNetwork parts too much and accidentally used the wrong input set to confirm the correctness of the network. It actually does work correctly with the right learning rate, momentum, and number of iterations.

Specifically, in main, I was using inputs instead of a smaller array in to test the outputs of the network.

Cave Dweller
  • 502
  • 2
  • 4
  • 17