4

I'm working on a project base on Keras Plays Catch code. I have changed the game to a simple Snake game and I represent the snake a dot on the board for the sake of simplicity. If Snake ate the reward it will get +5 score and For hitting wall it will get -5 and for every move -0.1. But It's not learning the strategy and gives terrible results. here is my Games play function

def play(self, action):
    if action == 0:
        self.snake = (self.snake[0] - 1, self.snake[1])
    elif action == 1:
        self.snake = (self.snake[0], self.snake[1] + 1)
    elif action == 2:
        self.snake = (self.snake[0] + 1, self.snake[1])
    else:
        self.snake = (self.snake[0], self.snake[1] - 1)

    score = 0
    if self.snake == self.reward:
        score = 5
        self.setReward()
    elif self.isGameOver():
        score = -5
    else:
        score = -0.1

    return self.getBoard(), score, self.isGameOver()

which returns something like this (1 is the snake and 3 is the reward and 2 represents the wall):

 [[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 1. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 3. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]

and here is my code for q learning on gist.

I don't know what I'm doing wrong but Most of the games it plays, it gets stucked in a loop (up and down or right and left) or it gets right to the wall and there is a small chance of eating the reward before it hits the wall. How can I improve it and make it work?

knh190
  • 2,744
  • 1
  • 16
  • 30
Amir_P
  • 8,322
  • 5
  • 43
  • 92

1 Answers1

1

If your snake never hits the reward it may never learn the +5 score. Instead of using constant 0.1 penalty per move, use a distance based cost for each tile will probably help. In another word, the agent in your game is not aware of the existence of a reward.

I think eventually you'll end up with something like A* path finding. At least the heuristics are similar.


Update:

Considering the complete code you've posted, your loss function and the score doesn't match! When score is high your model's loss is random.

Try maximizing game score as your goal.

knh190
  • 2,744
  • 1
  • 16
  • 30
  • yes that came in my mind too but that's not even helping. I changed the constant penalty to this `score = (14 - (abs(self.snake[0] - self.reward[0]) + abs(self.snake[1] - self.reward[1]))) / 14 which gives 0 for the longest distance. but it gets stuck in loop again ` – Amir_P Jan 22 '19 at 07:10
  • @Amir_P your distance cost seems not right to me. Try simpler penalty: `abs(c1.x-c2.x) + abs(c1.y-c2.y)` which is `width + height` for a rectangle. Or use euclidean distance. – knh190 Jan 22 '19 at 07:15
  • I was rewarding for being close but you mean penalty for being far from reward. ok I'm testing it now – Amir_P Jan 22 '19 at 07:25
  • ok it's a little better but again in some cases it stucks in a loop. how can I make it more general for all situations? – Amir_P Jan 22 '19 at 07:37
  • it's going up and down repeatedly even when there is a reward near it but it's taking the penalty and doing the wrong move whole the time. I think something's wrong with my experience play isn't it? – Amir_P Jan 22 '19 at 09:25
  • @Amir_P can you post your complete code to a gist to reproduce the problem? Also provide a small dataset if possible. – knh190 Jan 22 '19 at 09:28
  • thanks for your help https://gist.github.com/Amir-P/e5203b2ffbcc70b217d22ab7910a0f7d – Amir_P Jan 22 '19 at 10:23
  • @Amir_P your loss function and the score doesn't match! When the score is high, your loss is random! Your goal is to maximize the score. However I'm not familiar with the API so that's all I can tell. – knh190 Jan 22 '19 at 12:59
  • @Amir_P you can start a bounty for a better answer. But even though this is not an answer, if it's helpful you may still give it an upvote. Thanks. – knh190 Jan 22 '19 at 13:07
  • thanks for your answer. I will work on it and will try to improve it with the clues you gave me and If I can't I will start a bounty. – Amir_P Jan 22 '19 at 19:44