0

While calculating TD error of the target network in Prioritized Experience Replay, we have from the paper equation 2) in Appendix B:

$$\delta_t := R_t + \gamma max_a Q(S_t, a) - Q(S_{t-1}, A_{t-1})$$

It seems unnecessary / incorrect to me that the same formula applies if $S_t$ is a terminal state. This is because when calculating the error while updating the action network, we take special care of the terminal state and don't add a reward to go term (such as the $\gamma max_a Q(S_t, a)$ above). See here for example: https://jaromiru.com/2016/10/03/lets-make-a-dqn-implementation/ .

My question is:

  1. Should terminal states be handled separately when calculating TD error for Prioritized Experience Replay?
  2. Why / why not?
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Srikiran
  • 309
  • 1
  • 3
  • 9
  • I don't know if I fully understood your question, but according to TD definition, the expected Q value for the terminal state is simply the immediate reward, so, to your question, yes, it should be handled differently since you won't have the discounted Q value term for the target value. I happened to have a post on this, maybe take a look to see if that helps https://levelup.gitconnected.com/dqn-from-scratch-with-tensorflow-2-eb0541151049 – Tianhao Zhou Jul 17 '20 at 23:46
  • 1
    [Artificial Intelligence Stack Exchange](https://ai.stackexchange.com/) is probably a better place to ask theoretical questions related to reinforcement learning, so I suggest that you ask your question there. If you ask it there, please, delete it from here (to avoid cross-posting, which is generally discouraged). (Also, there, latex will work :P) – nbro Jul 18 '20 at 12:41
  • @Tianhaoz : I looked at your tutorial briefly, but I don't think you have implemented prioritized experience replay? This paper talks about it: https://arxiv.org/abs/1511.05952 . I am asking about the TD error term that is implemented while calculating the priorities. I am referring to equation 2 in that paper. Hope it clarifies my question? – Srikiran Jul 18 '20 at 22:17

0 Answers0