I'm still new to ML, recently I've learned Q-Learning and coded it manually (Not using a library like Keras or TensorFlow), and the problem I'm facing is knowing how to write a good reward function for my agent, I've started by writing the following simple reward function:
When moving from X, Y to X1, Y1: Return (Distance(from X,Y to Target) - Distance (from X1,Y1 to Target))
Which means it got positive reward whenever it moved towards the Target, and it worked fine on an empty 2D plain.
But when I added obstacles, that function did not help, the agent took the shortest path to the target getting stuck in obstacles forever, I added punishment for staying in place, and it got stuck against the wall again but this time going back and forth because the total of punishment + reward was 0, and it had already gotten a positive reward so this was the favourable path. I then added punishment for passing the same square twice, but still, I feel like this may have gotten too convoluted, and that there must be a simpler way to do this
Starting position (Green is the agent, Red is the target)
Getting stuck in the blocked shortest direct path
I realize there are multiple things I've understood/done wrong about the reward after reading about it a bit, from having my reward go up to 2k in one move, instead of being in the range [-1, 1], and not having a clear distinction of when to use Negative vs Positive reward.
My memory array of state vs action consists of n states where n=rows*columns, and 5 actions (up, right, down, left, stayinplace).
So, knowing that my agent is supposed to find the Shortest Available Path to the target (Not blocked), what should my reward function look like? and why? Also following the algorithm I learned from, they didn't really specify the values for Epsilon, Gamma and LearningRate, so I set them to 0.2, 0.85, 0.75 respectively.
My code is in python if you want to send the reward function in code.
PS: I searched up the problem on and off StackOverflow and all I found was references and articles, all of which explained what a reward function should do, but no detail on how to make it do that, or turn my query into a reward function.
Here's my code file on Github (No GUI): https://github.com/EjHam98/LearningMachineLearning/blob/master/QLearning.py