I have some trouble with my implementation of a deep neural network to the game Pong because my network is always diverging, regardless which parameters I change. I took a Pong-Game and implemented a theano/lasagne based deep-q learning algorithm which is based on the famous nature paper by Googles Deepmind.
What I want:
Instead of feeding the network with pixel data I want to input the x- and y-position of the ball and the y-position of the paddle for 4 consecutive frames. So I got a total of 12 inputs.
I only want to reward the hit, the loss, and the win of a round.
With this configuration, the network did not converge and my agent was not able to play the game. Instead, the paddle drove directly to the top or bottom or repeated the same pattern. So I thought I try to make it a bit easier for the agent and add some information.
What I did:
States:
- x-position of the Ball (-1 to 1)
- y-position of the Ball (-1 to 1)
- normalized x-velocity of the Ball
- normalized y-velocity of the Ball
- y-position of the paddle (-1 to 1)
With 4 consecutive frames I get a total input of 20.
Rewards:
- +10 if Paddle hits the Ball
- +100 if Agent wins the round
- -100 if Agent loses the round
- -5 to 0 for the distance between the predicted end position (y-position) of the ball and the current y-position of the paddle
- +20 if the predicted end position of the ball lies in the current range of the paddle (the hit is foreseeable)
- -5 if the ball lies behind the paddle (no hit possible anymore)
With this configuration, the network still diverges. I tried to play around with the learning rate (0.1 to 0.00001), the nodes of the hidden layers (5 to 500), the number of hidden layers (1 to 4), the batch accumulator (sum or mean), the update rule (rmsprop or Deepminds rmsprop).
All of these did not lead to a satisfactory solution. The graph of the loss averages mostly looks something like this.
You can download my current version of the implementation here
I would be very grateful for any hint :)
Koanashi