Is the reward related to previous state or next state?

Question

In the reinforcement learning framework, I am a little bit confused about the reward and how it is related to states. For example, in Q-learning, we have the following formula for updating the Q table:

that means that the reward is obtained from the environment at the time t+1. I mean that after applying the action a_t, the environment gives s_t+1 and r_t+1.

It is often true that the reward is associated with the previous time step, that is using r_t in the above formula. See, for example the Wikipedia page for Q-learning (https://en.wikipedia.org/wiki/Q-learning). Why is this?

Accidentally, some Wikipedia pages about the same topic but in different languages, use r_t+1 (or unexpectedly R_t+1). See, for example, the Italian and Japanese pages:

Yes both notations are used and I don't think there is a specific reason for it. I think you just need to figure out what is meant from the context. Check out [SARSA Wiki](https://en.wikipedia.org/wiki/State%E2%80%93action%E2%80%93reward%E2%80%93state%E2%80%93action) (the second paragraph about the notation). Also this article states that it uses the `s_t, a_t, r_t, s_t+1, a_t+1` convention but then shows the above formular which clearly requries the `s_t, a_t, r_t,+1 s_t+1, a_t+1` convention. — SaiBot, Jan 03 '21 at 23:55
[Artificial Intelligence Stack Exchange](https://ai.stackexchange.com/) is probably a better place to ask theoretical questions related to reinforcement learning, so I suggest that you ask your question there. If you ask it there, please, delete it from here (to avoid cross-posting, which is generally discouraged). — nbro, Jan 04 '21 at 01:06

Is the reward related to previous state or next state?

0 Answers0