I'm coding a simple q-learning example and to update q-values you need a maxQ'.
I'm not sure if maxQ' is referring to the sum of all possible rewards or the highest possible reward:
I'm coding a simple q-learning example and to update q-values you need a maxQ'.
I'm not sure if maxQ' is referring to the sum of all possible rewards or the highest possible reward:
That is maximum Q-values
among all possible actions for the state s'
. Basically, you need to take a max
over all Q(s',a')
for all valid actions a'
in state s'
.