Q-learning: What is the correct state for reward calculation

Question

Q learning - rewards

I'm struggling to interpret the pseudocode for the Q learning algorithm:

1  For each s, a initialize table entry Q(a, s) = 0
2  Observe current state s
3  Do forever:
4     Select an action a and execute it
5     Receive immediate reward r
6     Observe the new state s′ ← δ(a, s)
7     Update the table entry for Q(a, s) as follows:
8        Q( a, s ) ← R( s ) + γ * max Q( a′, s′ )
9     s ← s′

Should the rewards be collected from the subsequent state s' or the current state s?

score 2 · Accepted Answer · answered Apr 02 '14 at 08:20

2

The rewards should be collected from the subsequent state you enter after executing the action a.

answered Apr 02 '14 at 08:20

jorgenkg

4,140
1
34
48

Q-learning: What is the correct state for reward calculation

Q learning - rewards

1 Answers1