1

I am trying to build/understand exactly how the monte carlo tree search algorithm (mcts) works in conjunction with a neural network to learn how to play games (like chess). But I'm having trouble understanding when to reset the tree.

I have looked at https://github.com/suragnair/alpha-zero-general as an example. But one thing that doesn't make sense to me about this implementation is that it resets the tree after each individual game which doesn't seem right to me. (So a new tree is created everytime a new game is made). I thought the idea of mcts was to accumulate knowledge over a lot of games and only reset the tree once you have trained your network to predict new probabilities for each boardstate?

Is this me misunderstanding mcts or is this a bug in that particular implementation?

Tue
  • 371
  • 1
  • 14

1 Answers1

1

I'll assume you're talking about AlphaZero since that's what the repo you linked to does, but MCTS is a more general term.

Keeping old search trees around forever isn't really practical, it would take too much memory and most of them wouldn't end up being used ever again (since games generated later in the training process hopefully look very different from the early ones). The whole point is that we distill the knowledge we gained through tree search into a neural network, which we then use in future games to do an even better tree search.

As a small optimization it still makes makes sense to either keep the tree around for the next move, or to instead temporarily cache neural network outputs. I assume this is what you saw in the repo you linked.

Todd Sewell
  • 1,444
  • 12
  • 26
  • I don't think I understand what you mean when you say "As a small optimization it still makes makes sense to either keep the tree around for the next move, or to instead temporarily cache neural network outputs." To me this sounds like it is normal to reset the tree after each move? which is even more aggresive than to reset it after each game. But to me I don't understand how mcts is useful if you don't save the visited states/actions longterm? The part about distilling the mcts information into the neural network makes sense, and then resetting it. – Tue Jan 12 '23 at 17:48
  • Indeed, a simple implementation of AlphaZero can just build the tree from scratch on every move. IIRC this is also how the paper presents the algorithm. The core idea of AlphaZero is that the NN learns to predict what the result of MCTS would be, so it acts like a fuzzy database of past tree searches. As such you don't need to additionally keep old trees around yourself. – Todd Sewell Jan 12 '23 at 22:01