How to include Temporal Difference in Monte Carlo Tree Search?

Asked Sep 28 '22 at 00:34

Active Sep 28 '22 at 01:14

Viewed 32 times

Im wondering how TD can be included in MCTS to enhance its learning? Most TD applications use the Reward obtained in the next state S', however, in MCTS rewards are obtained after a whole rollout, so, how can TD be implemented?

Would it be something like:

Q(s) = Q(s) + a*(Reward - Q(s))

for every node in the backpropagation stage? Currently I update the average reward obtained for every node but i think a TD implementation would be better.

Thanks in advance

edited Sep 28 '22 at 01:14

asked Sep 28 '22 at 00:34

Diego Troncoso Kurtovic

How to include Temporal Difference in Monte Carlo Tree Search?

0 Answers0