Should the Monte Carlo tree in calculating the previous bestMove be used to feed the next Monte Carlo search?

Question

I have seen some MCTS implementation online and how they are used in a game. A best move is calculated each move based on the state at that moment. If you have a sequence of moves in a game between human and computer like:

turn_h1,turn_c1,turn_h2,turn_c2,turn_h3,turn_c3,....turn_hn,turn_cn

turn_h(i)=human, turn_c(i)=computer and i the i-th move of a player (human/computer).

And for each computer's turn i there is a corresponding state that is used to determine the i-th best move with MCTS.

Question: Should the tree built in the (i-1)-th turn(bestmove) be used for the i-th turn(MCTS bestmove)?

I mean, should the tree which was the result of the best move in state (n-1) be used as input for determining the best move at the i-th state?

Other words can I re-use already constructed tree-nodes from previous turns/bestmoves calculations, so that I do not need to build the whole tree again?

I have created a sequence of turns in pseudo-code just to make clear what what I mean with using the (i-1)th state(tree) to feed the next MCST bestmove. (of course in real world the logic below would be implemented as an iteration/loop construct):

#start game
initial_game_state.board= initialize_board()

#turn 1
#human play
new_game_state_1 = initial_game_state.board.make_move(move_1)

#computer play
move_1 = MCTS.determine_bestmove(new_game_state_1)
new_game_state_2 = game_state_1.board.make_move(move_1)

#turn 2
#human play
new_game_state_3 = new_game_state_2.board.make_move(move_2)
#computer play
move_3 = MCTS.determine_bestmove(new_game_state_3)
new_game_state_4 = new_game_state_4.board.makeMove(move_3)

#turn 3
# ....

score 1 · Answer 1 · answered Aug 16 '19 at 20:00

Yes you can do this. This is commonly referred to as "tree reuse" (at least, that's how I usually call it).

You would start out your MCTS call (except for the very first one, in which there is no "previous tree" yet) by navigating from the root node to the node that corresponds to the one you have actually reached in the "real" game.

Note that, in a two-player alternating-move game, this does not only involve a move that your MCTS agent made, but also a move made by the opponent. Due to how MCTS work, if the opponent "surprised" your MCTS agent by selecting a move that MCTS didn't predict, it is likely that this leads to a subtree of the previous tree that had relatively few visits. In this case, tree reuse won't have much effect. But in cases where the opponent doesn't surprise you, and plays exactly what MCTS already predicted during the previous search, you may end up getting a relatively large subtree to initialise your new search with.

As for if you "should" do this, as is the literal wording in your question... you don't have to. There are many MCTS implementations out there which don't do this. I'd generally recommend it anyway. It's not too difficult to implement. It generally won't give a big boost in performance (because the playing strength of MCTS tends to scale sub-linearly with increases in "thinking time"), but it definitely shouldn't hurt either, and may give a small boost in playing strength.

Note that, in nondeterministic games, if you implement an "open-loop" variant of MCTS (without explicit chance nodes), the part of the subtree that you're "re-using" will be partially based on outdated information. In such games, it may be beneficial to discount all the statistics gathered in your previous search (i.e. multiply all your visit counts and accumulated scores by a number between 0 and 1) before starting the new search process.

Important implementation detail: when re-using the previous tree, if your new root node (which used to be a node in the middle of your previous tree) has a reference/pointer back to its parent node, make sure to set it to null. If you forget about this, all search trees of all your previous searches will fully persist in memory throughout an entire game, and you'll likely run out of memory quickly.

Should the Monte Carlo tree in calculating the previous bestMove be used to feed the next Monte Carlo search?

1 Answers1