based on my last question I built a 3D-Game in which two robot arms are playing table tennis with each other. The robots have six degrees of freedom.
The states are composed by:
- x, y and z-Position of the Ball
- 6 angles of the robot
All values are normalized, so that they take a value between [-1,1]. With 4 consecutive frames, I get a total input of 37 parameters.
Rewards
- +0.3 when the player hits the ball
- +0.7 when the player wins the match
- -0.7 when the player loses the match
Output
Every of the six robot joints can move with a certain speed, so every joint has the possibilities to move in a positive direction, to stay or to move in a negative direction.
This results in 3^6=729 outputs.
With these settings, the neural network should learn the inverse kinematics of the robot and to play table tennis.
My problem is, that my network converges, but seems to get stuck in a local minimum and depending on the configuration, afterward, begins to converge.
I first tried networks with two and three hidden layers with 1000 nodes, and after a few epochs, the network began to converge. I realized that 1000 nodes are way too much and lowered them to 100, with the result, that the network behaves as described, it converges first and then slightly diverges. So I decided to add hidden layers. Currently, I am trying a network with 6 hidden layers, 80 nodes each. The current loss looks like:
So what do you think, experienced machine learning experts? Do you see any problems with my configuration? Which type of network would you choose?
I'm glad for every suggestion.