0

I've built a relatively simple artificial neural network in attempts to model the Value function in a Q-Learning problem, but to verify my implementation of the network was correct I am trying to solve the XOR problem.

My network architecture uses two layers both with tanh activation, a learning factor of .001, bias units, and momentum set to .9. After each training iteration, I print the error term (using squared error), and run until my error converges to ~.001. This works about 75% of the time, but the other 25% my network will converge to ~.5 error which is rather large for this problem.

Here is a sample printout of the error term:

0.542649
0.530637
0.521523
0.509143
0.504623
0.501864
0.501657
0.500268
0.500057
0.500709
0.501979
0.501456
0.50275
0.507215
0.517656
0.530988
0.535493
0.539808
0.543903

The error oscillates as such until infinity.

So the question is: is my implementation broken, or is it possible that I am running into a local minimum?

Andnp
  • 674
  • 5
  • 16
  • See: http://stats.stackexchange.com/questions/126994/questions-about-q-learning-using-neural-networks and especially @zergylord's answer to Q3. – BadZen Dec 01 '15 at 17:44
  • Ah yes, I've read into this post quite a bit. I've tried many of the suggestions for instance using linear output layers, ReLu layers, etc. (even Maxout, although my implementation is still a work in progress [aka broken]). But all of my attempts have lead to this same issue where my error oscillates about some large value and never fully converges towards 0. My fear is that I have a bad implementation, but this may just be a symptom of local minimum convergence. I just am not experienced enough to be able to tell the difference yet. – Andnp Dec 01 '15 at 18:11
  • 1
    I recommend doing two things: 1) verify the derivative calculations at each sample in your backprop with a numerical differentiation routine (obv. for testing and not when training for real) to make sure there are no bugs there, and 2) try training with some known-correct nonlinear optimization routine that does directed line searches (say BFGS) instead of a Q-learning regimen - this class of algorithms with only step "forward" - you should never see a `J_{t+1} >= J_t` if the implementation is correct. – BadZen Dec 01 '15 at 18:17
  • Thanks for the suggestions! I tested derivatives and also implemented Cross Entropy and had the same problem with much lower probability. However when training my network on a linear routine, the error decreased monotonically (as expected). I'll check to see how it behaves with BFGS. – Andnp Dec 01 '15 at 19:39

0 Answers0