For a single player game, Q-value updates are pretty intuitive. The current state and the future state depend on the strategy of a single player, but for two player this isn't the case. Consider the scenario where the opponent wins and the game is terminated. How are the Q-values are updated?
Asked
Active
Viewed 495 times
1 Answers
1
One common approach is to consider your opponent as part of the environment, so the state would be defined to include the say, the opponent's position. You pick an action and execute it, modifying the state. The opponent then takes their action, modifying the state again. Your agent then receives the state prime that is the result of its previous action and the opponent's previous action.
So in the case that in state s
you take action a
, then the opponent acts and terminates the game, you would record a transition from s
to a terminal state via a
.

Nick Walker
- 790
- 6
- 19