In <Lecture 2: Markov Decision Processes> by David Silver on page 19, it has the following Derived formula:
I found is equal to
which means Gt+1 = v(St+1) so Gt = v(St).
According to Return Defination:
and according to Gt = v(St):
But the defination of Value Function is
which means
v(s) = =
which is absolutly wrong.
My question are:
- Why Gt+1 = v(St+1)?
- Where are my derivation mistakes?