2

I tried to implement tic-tac-toe hello-world MCTS game player but I encountered a problem.

While simulating the game and choosing "the most promising" (exploit/explore) node I only take total wins number into account ("exploit" part) - this causes certain problem, the resulting algorithm is not defensive at all. As a result when choosing between

  • move that results in (100 draws; 10 loses)
  • move that results in (1 wins; 109 loses)

the worse one is chosen (1; 109) because my uct function greedily counts avg wins instead of "value".

Am I identyfing this problem correctly? Should I switch from "avg wins" to some other value metric that takes all results types into account ?

Any advice is welcome, thanks

Dennis Soemers
  • 8,090
  • 2
  • 32
  • 55
  • Depends if you are trying to win at all costs or you are trying to get the best possible result (I would generally pursue the second) – FrankS101 Feb 16 '18 at 11:03

1 Answers1

3

Since tic-tac-toe is a zero-sum game (value of a state for one player is always equal to the negated value for the opponent), your score function should also reflect this.

This means that, when computing average scores, you should use values like the following:

  • +1 for every win
  • 0 for every draw
  • -1 for every loss

In your example, this would lead to the following average scores:

  • move that results in (100 draws; 10 loses): avg. score = -10/110 = -0.0909...
  • move that results in (1 wins; 109 loses): avg. score = -108/110 = -0.98181...

So the first option would be viewed as the better option

Dennis Soemers
  • 8,090
  • 2
  • 32
  • 55