I'm working on a project base on Keras Plays Catch code. I have changed the game to a simple Snake game and I represent the snake a dot on the board for the sake of simplicity. If Snake ate the reward it will get +5 score and For hitting wall it will get -5 and for every move -0.1. But It's not learning the strategy and gives terrible results. here is my Games play
function
def play(self, action):
if action == 0:
self.snake = (self.snake[0] - 1, self.snake[1])
elif action == 1:
self.snake = (self.snake[0], self.snake[1] + 1)
elif action == 2:
self.snake = (self.snake[0] + 1, self.snake[1])
else:
self.snake = (self.snake[0], self.snake[1] - 1)
score = 0
if self.snake == self.reward:
score = 5
self.setReward()
elif self.isGameOver():
score = -5
else:
score = -0.1
return self.getBoard(), score, self.isGameOver()
which returns something like this (1 is the snake and 3 is the reward and 2 represents the wall):
[[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
[2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
[2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
[2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
[2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
[2. 0. 0. 0. 0. 1. 0. 0. 0. 2.]
[2. 0. 0. 0. 0. 0. 3. 0. 0. 2.]
[2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
[2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]
and here is my code for q learning on gist.
I don't know what I'm doing wrong but Most of the games it plays, it gets stucked in a loop (up and down or right and left) or it gets right to the wall and there is a small chance of eating the reward before it hits the wall. How can I improve it and make it work?