2

based on my last question I built a 3D-Game in which two robot arms are playing table tennis with each other. The robots have six degrees of freedom.
The states are composed by:

  • x, y and z-Position of the Ball
  • 6 angles of the robot

All values are normalized, so that they take a value between [-1,1]. With 4 consecutive frames, I get a total input of 37 parameters.

Rewards

  • +0.3 when the player hits the ball
  • +0.7 when the player wins the match
  • -0.7 when the player loses the match

Output
Every of the six robot joints can move with a certain speed, so every joint has the possibilities to move in a positive direction, to stay or to move in a negative direction. This results in 3^6=729 outputs.

With these settings, the neural network should learn the inverse kinematics of the robot and to play table tennis. My problem is, that my network converges, but seems to get stuck in a local minimum and depending on the configuration, afterward, begins to converge. I first tried networks with two and three hidden layers with 1000 nodes, and after a few epochs, the network began to converge. I realized that 1000 nodes are way too much and lowered them to 100, with the result, that the network behaves as described, it converges first and then slightly diverges. So I decided to add hidden layers. Currently, I am trying a network with 6 hidden layers, 80 nodes each. The current loss looks like: loss

So what do you think, experienced machine learning experts? Do you see any problems with my configuration? Which type of network would you choose?
I'm glad for every suggestion.

Community
  • 1
  • 1
chron0x
  • 875
  • 9
  • 19

1 Answers1

3

I had in the past a similar problem. The goal was to learn the inverse kinematics for a robotarm with the neuroevolution framework NEAT. In the figure on the left is the errorplot. At the beginning all works fine, the network improves, but on a certain point the errorvalue remains on the same value and even after 30 Minutes of calculating no change was there. I don't think that your neural network is wrong, or that the number of neurons are wrong. I think, that neural networks are in general not capable of learning the inverse kinematics problem. I also think that the famous paper of deepmind (Playing Atari games with neuralnetwork) is bogus.

neural network inverse kinematics

But back to the facts. The plot in the OP (loss average) and my plot (population fitness) are showing both an improvment at the beginning and after a certain time a standstillcurve which can't be improved despite the fact, that the cpu is running on 100% for finding a better solution. It is unclear, how long the neuralnetwork have to be optimized until a significant improvement is visable and perhaps even after days or years of constant calculation no better solution will be found. A look into literature shows, that for every middle- or hardproblem that result is normal and until now no better neural networks or better learning algorithm were invented. The underlying problem is called combinatorial explosion and means that there are many milions possible solution for the network-weights and that a computer can only scan a small amount of them. If the problem is really easy like the "xor problem" a learning algorithm like backpropagation or RPropMinus will find a solution. On slightly more difficult problems like navigating in a maze, finding inverse kinematics or peg-in-hole task no current neural network will find a solution.

Manuel Rodriguez
  • 734
  • 4
  • 18
  • Thanks very much for your answer, this helps. In consideration of the combinatorial explosion my approach would be to add more rewards, for example for the distance between a possible hit position and the position of the paddle, or firstly to show the network some moves of a good player and let it learn afterwards. Why do you think the paper is bogus? I am pretty new to the field and would like to hear your opinion. Would be great, if you write me a message or send me an article wich expresses your opinion. – chron0x Oct 21 '16 at 07:53