I'm creating a MCTS (Monte Carlo Tree Search) program for a 2 player game.
For this I create nodes in the tree, from alternating perspectives (the first node is from the perspective of player 1, any child nodes are from the perspective of player 2, etc.)
When determining the final move (after simulating many nodes) I pick the move that has the highest win chance. This win chance depends on the win chance in deeper nodes. For example assume I have 2 legal moves to make. For the first (call the associated node C1 - Child 1) I have done 100 simulations and won 25, while for the second (C2) I did 100 simulations and won 50. Then the first node has a win chance of 25% versus 50% for the second, so I should prefer the second node.
However, this does not take into account the "likely" moves that my opponent will make. Assume that from C2 there are two possible legal moves (for my opponent), lets call these C21 and C22. I did 50 simulations for both and in C21 my opponent won 50 games out of 50 (100% win chance) and in C22 they won 0 out of 50 (0% win chance). Having done these simulations I can see that it is much more likely that my opponent will take move C21 and not C22. That means that if I take move C2, then my -statistical- win chance is 50%, but my -expected- win chance is close to 0%.
Taking into account this information I would select move C1 and not C2, even though the -statistical- chance of winning is lower. And I could program my algorithm to do exactly this, improving the performance.
This seems like a very obvious improvement for the MCTS algorithm, but I have not seen any reference to it, which makes me suspect that I'm missing something essential.
Can anybody point out the flaw in my reasoning or point me to any articles that deal with this?