1

I am new to Machine Learning, and I am trying to solve MountainCar-v0 using Q-learning. I can solve the problem now, but I am still confused.

According to the MountainCar-v0's Wiki, the reward remains -1 for every step, even if the car has reached the destination. How does the invariant reward help the agent learn? If every step gives the same reward, how can the agent tell if it is a good move or a bad move?

Thanks in advance!

Jiahao Cai
  • 1,222
  • 1
  • 11
  • 25

1 Answers1

2

The goal is to get the car to its destination as quickly as possible. If the agent has a fast run, even though the reward is still negative, it is still higher than the lower reward the agent would receive for a relatively slow run. This difference is enough for the agent to learn. The reward system for this environment encourages the agent to get to its target destination as soon as possible because it only stops receiving negative rewards once it reaches that terminal state.

R.F. Nelson
  • 2,254
  • 2
  • 12
  • 24