I am trying to build/understand exactly how the monte carlo tree search algorithm (mcts) works in conjunction with a neural network to learn how to play games (like chess). But I'm having trouble understanding when to reset the tree.
I have looked at https://github.com/suragnair/alpha-zero-general as an example. But one thing that doesn't make sense to me about this implementation is that it resets the tree after each individual game which doesn't seem right to me. (So a new tree is created everytime a new game is made). I thought the idea of mcts was to accumulate knowledge over a lot of games and only reset the tree once you have trained your network to predict new probabilities for each boardstate?
Is this me misunderstanding mcts or is this a bug in that particular implementation?