1

In the reinforcement learning framework, I am a little bit confused about the reward and how it is related to states. For example, in Q-learning, we have the following formula for updating the Q table:

q-learning update formula

that means that the reward is obtained from the environment at the time t+1. I mean that after applying the action at, the environment gives st+1 and rt+1.

It is often true that the reward is associated with the previous time step, that is using rt in the above formula. See, for example the Wikipedia page for Q-learning (https://en.wikipedia.org/wiki/Q-learning). Why is this?

Accidentally, some Wikipedia pages about the same topic but in different languages, use rt+1 (or unexpectedly Rt+1). See, for example, the Italian and Japanese pages:

MadMage
  • 186
  • 1
  • 7
  • 1
    Yes both notations are used and I don't think there is a specific reason for it. I think you just need to figure out what is meant from the context. Check out [SARSA Wiki](https://en.wikipedia.org/wiki/State%E2%80%93action%E2%80%93reward%E2%80%93state%E2%80%93action) (the second paragraph about the notation). Also this article states that it uses the `s_t, a_t, r_t, s_t+1, a_t+1` convention but then shows the above formular which clearly requries the `s_t, a_t, r_t,+1 s_t+1, a_t+1` convention. – SaiBot Jan 03 '21 at 23:55
  • [Artificial Intelligence Stack Exchange](https://ai.stackexchange.com/) is probably a better place to ask theoretical questions related to reinforcement learning, so I suggest that you ask your question there. If you ask it there, please, delete it from here (to avoid cross-posting, which is generally discouraged). – nbro Jan 04 '21 at 01:06

0 Answers0