0

I'm coding a simple q-learning example and to update q-values you need a maxQ'.

I'm not sure if maxQ' is referring to the sum of all possible rewards or the highest possible reward:

enter image description here

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

1 Answers1

1

That is maximum Q-values among all possible actions for the state s'. Basically, you need to take a max over all Q(s',a') for all valid actions a' in state s'.

Afshin Oroojlooy
  • 1,326
  • 3
  • 21
  • 43