1

In "Stone, Peter, Richard S. Sutton, and Gregory Kuhlmann. "Reinforcement learning for robocup soccer keepaway." Adaptive Behavior 13.3 (2005): 165-188.", the RLstep pseudocode seems quite a bit different from Sarsa(λ), which the authors say RLStep implements.

Here is the RLstep pseudocode and here is the Sarsa(lambda) pseudocode.

The areas of confusion are:

  • Line 10 in the Sarsa(λ) pseudocode updates the Q value for each state-action pair after adding 1 to the e(s,a). But in the RLstep pseudocode the eligibility trace update (line 19) doesn't happen until after the value update (line 17).

  • Lines 18 and 19 in RLstep seem quite different from the Sarsa(λ) pseudocode.

  • What are lines 20-25 doing with the eligibility trace?

Nick Walker
  • 790
  • 6
  • 19
user186199
  • 115
  • 2
  • 7

0 Answers0