0

I am trying my hands on Reinforcement/Deep-Q learning these days. And I started with a basic game of 'Snake'. With the help of this article: https://towardsdatascience.com/how-to-teach-an-ai-to-play-games-deep-reinforcement-learning-28f9b920440a Which I successfully trained to eat food. Now I want it to eat food in specific number of steps say '20', not more, not less. How will the reward system and Policy be changed for this? I have tried many things, with little to no result. For example I tried this:

 def set_reward(self, player, crash):
    self.reward = 0
    if crash:
        self.reward = -10
        return self.reward
    if player.eaten:
        self.reward = 20-abs(player.steps - 20)-player.penalty
        if (player.steps == 10):
            self.reward += 10 #-abs(player.steps - 20)
        else:
            player.penalty+=1
            print("Penalty:",player.penalty)

Thank You. Here's is the program: https://github.com/maurock/snake-ga

Mohak Shukla
  • 77
  • 1
  • 7

1 Answers1

2

I would suggest this approach is problematic because despite changing your reward function you haven't included the number of steps in the observation space. The agent needs that information in the observation space to be able to differentiate at what point it should bump into the goal. As it stands, if your agent is next to the goal and all it has to do is turn right but all it's done so far is five moves, that is exactly the same observation as if it had done 19 moves. The point is you can't feed the agent the same state and expect it to make different actions because the agent doesn't see your reward function it only receives a reward based on state. Therefore you are contradicting the actions.

Think of when you come to the testing the agents performance. There is no longer a reward. All you are doing is passing the network a state and you are expecting it to choose different actions for the same state.

I assume your state space is some kind of 2D array. Should be straightforward to alter the code to contain the number of steps in the state space. Then the reward function would be something like if observation[num_steps] = 20: reward = 10. Ask if you need more help coding it