0
\begin{equation}
​Q_{t+1}(s_t,a_t) = Q_{t}(s_t,a_t) +\alpha
(R_{t+1} + \gamma * \max(Q_t(s_{t+1}, a)) - Q_t(s_t, a_t))
\end{equation}

In above equation,there is a term max(Q_t(s_{t+1},a)) Now say after you take an action in state s_t resulting in s_{t+1}. There are no available moves in s_{t+1}. The game has ended in draw, What is this max(Q_t(s_{t+1},a)) then?

Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142

1 Answers1

2

The value of terminal (aka absorbing) states are 0 by definition in V and Q functions, as it can be read in Section 3.7 of Rich Sutton's book:

enter image description here

Pablo EM
  • 6,190
  • 3
  • 29
  • 37