Monte Carlo Tree Search Alternating

Question

Could anybody please clarify how (as I have not found any clear example anywhere) The MCTS algorithm iterates for the second player.

Everything I seem just seems to look like it is playing eg P1 move every time. I understand the steps for one agent but I never find anything showing code where P2 places its counter, which surely must happen when growing the tree.

Essentially I would expect:

for each iter:

select node Player1 expand Player1

select node Player2 expand player 2

rollout backpropogate

next iter

Is this right?? Could anybody please spell out some psuedocode showing that? Either iteratively or recursion i don't mind.

Thanks for any help.

Ok, thanks for the additional advice, I will take another look with that in mind. — progan01, Oct 02 '17 at 08:41
I am still not sure here. I was thinking the iteration must look like this: for each iter player1 select player 1 expand player2 select player2 expand rollout backpropogate next iter — progan01, Oct 03 '17 at 14:34

score 1 · Answer 1 · answered Sep 28 '17 at 17:30

1

The trick is in backpropagation part, where you update "wins" variable from the point of view of player whose move led into this position.

Code for MCTS

Notice under UCT function, specially the comments:

 #Backpropagate
    while node != None: # backpropagate from the expanded node and work back to the root node
        node.Update(state.GetResult(node.playerJustMoved)) # state is terminal. Update node with result from POV of node.playerJustMoved
        node = node.parentNode

IF you follow the function call, you would realize visit variable is always updated; wins however, is not.

answered Sep 28 '17 at 17:30

Shihab Shahriar Khan

4,930
1
18
26

Thank you for your reply. i will review the code shortly. However i do still not get the selection aspect - I surely must have to alternate between moves available to P1, get that node then repeat everything for player 2 akin to minimax?? – progan01 Sep 28 '17 at 18:18
Your code looks very descriptive thanks. I think the difficulty I am having is that in my game each player has a distinct set of moves available, like in chess so I cannot keep (as i have seen) a list of available remaining moves - they are relative to player position also like neighbours, so I must generate them anew after each move. – progan01 Sep 28 '17 at 18:23
About the selection aspect, every node knows it's child nodes keeps wins/visits ration from his perspective, so following same UCB formula is ok. About second comment, notice every node has a corresponding State associated with it. You are asking for all untried moves from that state, (__init__ of Node), so untried moves attribute Node is of course relative to player position, generated anew after each move. – Shihab Shahriar Khan Sep 29 '17 at 05:39
The problem i am having is that the node does not know which moves are now not available - it sees a picture of the board before each other's last moves and so forth so the players can end up moving 'on top' of each other, or onto other invalid areas. I really cannot get this 'alternate between players in each state' - Within the MCTS iteration loop – progan01 Oct 05 '17 at 15:26

Monte Carlo Tree Search Alternating

1 Answers1