2

We assign +1 reward for reaching goal and -1 for reaching an unwanted state.

Is it necessary to give something like +0.01 reward for taking an action which reaches near to the goal and -0.01 reward for taking an action which does not ?

What will the significant changes with the reward policy mentioned above ?

Jay Joshi
  • 868
  • 8
  • 24

1 Answers1

3

From Sutton and Barto's book, Section 3.2 Goals and Rewards:

It is thus critical that the rewards we set up truly indicate what we want accomplished. In particular, the reward signal is not the place to impart to the agent prior knowledge about how to achieve what we want it to do.3.4For example, a chess- playing agent should be rewarded only for actually winning, not for achieving subgoals such taking its opponent's pieces or gaining control of the center of the board. If achieving these sorts of subgoals were rewarded, then the agent might find a way to achieve them without achieving the real goal. For example, it might find a way to take the opponent's pieces even at the cost of losing the game. The reward signal is your way of communicating to the robot what you want it to achieve, not how you want it achieved.

So, in general it's a good idea to avoid introducing prior knowledge through the reward function because it can yield to undesired results.

However, it is known that RL performance can be improved by guiding agent learning process through the reward function. In fact, in some complex task it's necessary to first guide the agent to a secondary (easier) goal, and then change the reward to learn the primary goal. This technique is know as reward shaping. An old but interesting example can be found in the Randløv and Alstrøm's paper: Learning to Drive a Bicycle using Reinforcement Learning and Shaping.

Pablo EM
  • 6,190
  • 3
  • 29
  • 37
  • Thank you for answer and suggestion about reward shaping.! – Jay Joshi Nov 06 '17 at 14:47
  • I am working on project where the reward is to survive an environment and live as much as the agent can. Its Pacman basically. So, there is no +1 reward in my situation. just -1 reward when it is died by ghost. Will this work,? Is it necessary to have a positive reward ? – Jay Joshi Nov 06 '17 at 15:19
  • Actually I don't have enough previous experience in game based environments, so I can't give you any specific advice regarding Pacman. However, it seems there are people who has worked in a very similar problems. I guess you can get some inspiration reading other people work. – Pablo EM Nov 06 '17 at 17:04
  • Thank you so much for Reward Shaping concept. I used it in my project. It is working very well. even better than when i specified rewards for each action it takes. I trained it for many iterations by that and now i am training after removing all additional rewards. just basic rewards are there. and now it is giving smart unexpected moves. Thank you :) – Jay Joshi Nov 07 '17 at 08:31
  • Wow! Very happy to hear I was helpful :) – Pablo EM Nov 07 '17 at 08:40