I'm implementing an AI that plays 2048 using monte carlo tree search. According to wikipedia https://en.wikipedia.org/wiki/Monte_Carlo_tree_search and all other sources that I have checked in the expansion step you should use the UCB formula in order to determine which node to visit wi/ni + c*sqrt(ln(N)/ni)
. This formula works well when the score at the end is either 0
or 1
(win or lose), however, this formula doesn't work in 2048 because the score is a value between 0
and n
that we want to maximize.
Does anyone know which is the optimal formula used for UCB in MCTS when the score is a value between 0
and n
so I could use it in the 2048 game?
Thank you.