How to understand the RLstep in Keepaway (Compare with Sarsa)

Asked Oct 21 '16 at 00:32

Active Oct 24 '16 at 09:15

Viewed 58 times

In "Stone, Peter, Richard S. Sutton, and Gregory Kuhlmann. "Reinforcement learning for robocup soccer keepaway." Adaptive Behavior 13.3 (2005): 165-188.", the RLstep pseudocode seems quite a bit different from Sarsa(λ), which the authors say RLStep implements.

Here is the RLstep pseudocode and here is the Sarsa(lambda) pseudocode.

The areas of confusion are:

Line 10 in the Sarsa(λ) pseudocode updates the Q value for each state-action pair after adding 1 to the e(s,a). But in the RLstep pseudocode the eligibility trace update (line 19) doesn't happen until after the value update (line 17).
Lines 18 and 19 in RLstep seem quite different from the Sarsa(λ) pseudocode.
What are lines 20-25 doing with the eligibility trace?

edited Oct 24 '16 at 09:15

Nick Walker

asked Oct 21 '16 at 00:32

user186199

How to understand the RLstep in Keepaway (Compare with Sarsa)

0 Answers0

Linked