I've built a relatively simple artificial neural network in attempts to model the Value function in a Q-Learning problem, but to verify my implementation of the network was correct I am trying to solve the XOR problem.
My network architecture uses two layers both with tanh activation, a learning factor of .001, bias units, and momentum set to .9. After each training iteration, I print the error term (using squared error), and run until my error converges to ~.001. This works about 75% of the time, but the other 25% my network will converge to ~.5 error which is rather large for this problem.
Here is a sample printout of the error term:
0.542649
0.530637
0.521523
0.509143
0.504623
0.501864
0.501657
0.500268
0.500057
0.500709
0.501979
0.501456
0.50275
0.507215
0.517656
0.530988
0.535493
0.539808
0.543903
The error oscillates as such until infinity.
So the question is: is my implementation broken, or is it possible that I am running into a local minimum?