I've been experimenting with Gym (and RL) a lot lately and there is one specific behaviour of gym that has piqued my interest. Why is it that OpenAI Gym return reward 0 even when game is over? For e.g, in Breakout-v0, when all five lives are spent, env.step will return done=True
and reward=0
. Shouldn't we notify agent that such a state is unfavourable by returning a negative reinforcement/reward ?
Also, for every step on the environment (still Breakout-v0), it will return reward 0 if no bricks/blocks were destroyed at that time. So how will the an agent be able to differentiate between a normal action and bad action?