Has anyone implemented the Deep Q-learning to solve a grid world problem where state is the [x, y] coordinates of the player and goal is to reach a certain coordinate [A, B]. Reward setting could be -1 for each step and +10 for reaching [A,B]. [A, B] is always fixed.
Surprisingly enough I did not find such an implementation on google. I tried DQN using taxi-v3 myself and it didn't work. So, looking for such a reference implementation to work my way up to my problem.