2

In sarsa λ with accumulative eligibility traces (http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node77.html) the algorithm given doesn't match with the formula.

The formula says E ← ɣλE+1

where as [algo] updates with first E ← E+1, then E ← ɣλE making the update effectively
E ← ɣλ.(E+1)

which is correct? I have also seen research papers with the exact same formula and algo.

Is it a discrepancy in the publication that they missed putting a pair of brackets around E+1?
If so, how is it that most research papers replicated the same error.

OR
if I've misunderstood something, please point out.

jaggi
  • 357
  • 1
  • 4
  • 17

1 Answers1

0

I think they didn't miss any bracket, it is E ← ɣλE+1. Since the E should reduce by ɣλ every time, unless the s is the current one. so the 1 refer to the current s. There is a figure here http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node75.html, which could make you understand this idea better, it's the one between Equ(7.5) and (7.6).

user186199
  • 115
  • 2
  • 7
  • 1
    I had seen that figure, my point is in algo why is updating with E ← ɣλE+ ɣλ – jaggi Oct 21 '16 at 04:10
  • instead of E ← ɣλE+1 (for the current state s).Figure 7.11 [http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node77.html](here) says for all states do E ← ɣλE which includes the current state whose eligibility trace has been incremented by 1 just before that 'decay' for loop – jaggi Oct 21 '16 at 04:17
  • I cann't understand your point. Have you see Equ (7.13), could you understand this? If you could, then in Figure 7.11, line 8 and line 11 is doing what the Equation (7.13) says. Maybe your point is, for the current s,a, the e(s,a) has been updated by both line 8 and line 11, as a result, it is updated with E ← ɣλE+ ɣλ not E ← ɣλE+1. Gosh, I thought I have understood this, now I also feel confused as you do. – user186199 Oct 21 '16 at 10:21
  • I also have a similar confusion related with this Sarsa, http://stackoverflow.com/questions/40166586/how-to-understand-the-rlstep-in-keepaway-compare-with-sarsa. Very strange, why no other person feel the eligibility trace thing confusion. – user186199 Oct 21 '16 at 10:28
  • Haha, I finally understand it. Line 11 is for the second line in Equ(7.13), it do this for the next step, that's e_{t+1}. while line 8 is for the first line in Equ (7.13) E ← ɣλE+1, Since the E ← ɣλE has been done in the line 11 of previous step, so here it just need to do E ← E+1. To be short, the line 8 in the current step combine with the line 11 in the previous step did what the first line in Equ(7.13) says. – user186199 Oct 21 '16 at 10:36
  • I didn't totally got it. Let me think about it, I'll ask if I still have the doubt. – jaggi Oct 24 '16 at 09:11
  • And I also found there is a sentence just above Equ(7.5)(http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node75.html):"On each step, the eligibility traces for all states decay by ɣλ, and the eligibility trace for the one state visited on the step is incremented by 1:" These sentence has already made it clear if you pay attention to key words "for all states" and "incremented by 1". – user186199 Oct 24 '16 at 10:11
  • Thanks for the clarification. – jaggi Jan 18 '17 at 16:36