0

Has anyone implemented the Deep Q-learning to solve a grid world problem where state is the [x, y] coordinates of the player and goal is to reach a certain coordinate [A, B]. Reward setting could be -1 for each step and +10 for reaching [A,B]. [A, B] is always fixed.

Surprisingly enough I did not find such an implementation on google. I tried DQN using taxi-v3 myself and it didn't work. So, looking for such a reference implementation to work my way up to my problem.

corvo
  • 676
  • 2
  • 7
  • 20

1 Answers1

1

For grid worlds deep Q-learning isn't needed, that's probably why there are few people doing it. However I found a tutorial that uses deep Q-learning with a grid world: https://livebook.manning.com/book/deep-reinforcement-learning-in-action/chapter-3/1

Tom Dörr
  • 859
  • 10
  • 22