1

From my understanding, the goal of playout or simulation stage in MCTS is to obtain a result and award or punish the nodes on the path from root during backpropagation. (Please correct me if I am wrong)

My question is if I can use domain knowledge heuristic to get this result rather than actually simulating the game to the end.

The reason I am asking is that I am doing something similar to pathfinding and the goal is to find the path to the goal state (node), which means simulating the game to the end is very difficult in my case.

Mark Jin
  • 2,616
  • 3
  • 25
  • 37

2 Answers2

1

Yes, you definitely can. I have personally done this in some game domains where it is not feasible to run a proper amount of simulations all the way till terminal states are reached.

If you always terminate simulations early and evaluate them using a heuristic evaluation function you will lose the guarantee that UCT (the most common MCTS implementation) has of finding the optimal action given an infinite amount of processing time, but you rarely have an infinite amount of processing time in practice anyway. In domains where it IS feasible to run enough simulations till the end it would probably be detrimental (unless it's a very good heuristic function and allows you to run many more simulations).

Dennis Soemers
  • 8,090
  • 2
  • 32
  • 55
0

I actually found this paper "Monte-Carlo Planning for Pathfinding in Real-Time Strategy Games". It uses the inverse of Euclidean distance as the reward.

Mark Jin
  • 2,616
  • 3
  • 25
  • 37