I have a neural network written in standard C++11 which I believe follows the back-propagation algorithm correctly (based on this). If I output the error in each step of the algorithm, however, it seems to oscillate without dampening over time. I've tried removing momentum entirely and choosing a very small learning rate (0.02), but it still oscillates at roughly the same amplitude per network (with each network having a different amplitude within a certain range).
Further, all inputs result in the same output (a problem I found posted here before, although for a different language. The author also mentions that he never got it working.)
The code can be found here.
To summarize how I have implemented the network:
Neuron
s hold the current weights to the neurons ahead of them, previous changes to those weights, and the sum of all inputs.Neuron
s can have their value (sum of all inputs) accessed, or can output the result of passing said value through a given activation function.NeuronLayer
s act asNeuron
containers and set up the actual connections to the next layer.NeuronLayer
s can send the actual outputs to the next layer (instead of pulling from the previous).FFNeuralNetwork
s act as containers forNeuronLayer
s and manage forward-propagation, error calculation, and back-propagation. They can also simply process inputs.- The input layer of an
FFNeuralNetwork
sends its weighted values (value * weight) to the next layer. Each neuron in each layer afterwards outputs the weighted result of the activation function unless it is a bias, or the layer is the output layer (biases output the weighted value, the output layer simply passes the sum through the activation function).
Have I made a fundamental mistake in the implementation (a misunderstanding of the theory), or is there some simple bug I haven't found yet? If it would be a bug, where might it be?
Why might the error oscillate by the amount it does (around +-(0.2 +- learning rate)) even with a very low learning rate? Why might all the outputs be the same, no matter the input?
I've gone over most of it so much that I might be skipping over something, but I think I may have a plain misunderstanding of the theory.