I'm designing a reward function of a DQN model, the most tricky part of Deep reinforcement learning part. I referred several cases, and noticed usually the reward will set in [-1, 1]. Considering if the negative reward is triggered less times, more "sparse" compared with positive reward, the positive reward could be lower than 1.
I wish to know why should I set always try to set the reward within this range (sometimes it can be [0,1], other times could be [-1,0] or simply -1)? What's the theory or principle behind the range?
I went through this answer; it mentioned set the 500 as positive reward and -1 as negative reward will destroy the network. But how would it destroy the model?
I can vaguely understand that correlated with gradient descent, and actually it's the gap between rewards matters, not the sign or absolute value. But I'm still missing clear hint how it can destroy, and why in such range.
Besides, when should I utilize reward like [0,1] or use only negative reward? I mean, within given timestep, both methods seems can push the agent to find the highest total reward. Only in situation like I want to let the agent reach the final point asap, negative reward will seems more appropriate than positive reward.
Is there a criteria to measure if the reward is designed reasonable? Like use the Sum the Q value of good action and bad action, it it's symmetrical, the final Q should around zero which means it converge?