For DQN with prioritized experience replay, what is the TD error for terminal states?

Question

While calculating TD error of the target network in Prioritized Experience Replay, we have from the paper equation 2) in Appendix B:

$$\delta_t := R_t + \gamma max_a Q(S_t, a) - Q(S_{t-1}, A_{t-1})$$

It seems unnecessary / incorrect to me that the same formula applies if $S_t$ is a terminal state. This is because when calculating the error while updating the action network, we take special care of the terminal state and don't add a reward to go term (such as the $\gamma max_a Q(S_t, a)$ above). See here for example: https://jaromiru.com/2016/10/03/lets-make-a-dqn-implementation/ .

My question is:

Should terminal states be handled separately when calculating TD error for Prioritized Experience Replay?
Why / why not?

I don't know if I fully understood your question, but according to TD definition, the expected Q value for the terminal state is simply the immediate reward, so, to your question, yes, it should be handled differently since you won't have the discounted Q value term for the target value. I happened to have a post on this, maybe take a look to see if that helps https://levelup.gitconnected.com/dqn-from-scratch-with-tensorflow-2-eb0541151049 — Tianhao Zhou, Jul 17 '20 at 23:46
[Artificial Intelligence Stack Exchange](https://ai.stackexchange.com/) is probably a better place to ask theoretical questions related to reinforcement learning, so I suggest that you ask your question there. If you ask it there, please, delete it from here (to avoid cross-posting, which is generally discouraged). (Also, there, latex will work :P) — nbro, Jul 18 '20 at 12:41
@Tianhaoz : I looked at your tutorial briefly, but I don't think you have implemented prioritized experience replay? This paper talks about it: https://arxiv.org/abs/1511.05952 . I am asking about the TD error term that is implemented while calculating the priorities. I am referring to equation 2 in that paper. Hope it clarifies my question? — Srikiran, Jul 18 '20 at 22:17

For DQN with prioritized experience replay, what is the TD error for terminal states?

0 Answers0