Why does this MCTS code miss forced wins?

Question

I'm trying to make a simple Monte Carlo Tree Search program (python3) that can take a given board and the player whose turn it is and will return the optimal move for them. I've written one that works somewhat, but while testing it with Tic-Tac-Toe I've found it frequently misses ways it could force wins or prevent losses.

Code explanation:

The game is stored as a class, simply to allow me to implement multiple different games with ease. It includes a moveList(board,player) function, which simply returns a list of all the possible moves the player could make. playGameRandom(board, player) simply plays a random game starting from the given position and returns the winner. I have tested all the other code, and I'm relatively certain it all works as intended, so I think the problem is in my MCTS implementation.

def MCTS(board, player, depth):
  #get list of moves
  ml = game.moveList(board, player)
  #store opposing player
  otherPlayer = game.otherPlayer(player)
  w = [0]*len(ml)
  # play (depth) random games from each starting move
  for i in range(len(ml)):
    for j in range(depth):
      winner = playGameRandom(ml[i], otherP)
      if winner == player:
        #weights for wins vs loss 
        w[i] = w[i] + 1
      elif winner == otherP:
        #also weights
        w[i] = w[i] - 2
  #w now contains scores for each move possible
  #find the highest weight      
  m = max(w)
  #just in case there are multiple moves tied for max, pick randomly out of all the moves tied for max
  p = []
  for i in range(len(w)):
    if w[i] == m:
      p = p + [i]
  #generate the board that corresponds to the best move
  bestMove = ml[random.choice(p)]    
  return bestMove

As I've stated, it works, but not as well as I hoped. It still misses things and makes stupid decisions occasionally, even when depth is very high (~100000). What's the problem with my implementation? Apologies in advance, I'm relatively new to all of this and I think I've bitten off more than I can chew here.

It looks like you're not doing any _tree_ search at all, you're just running `depth` random games for each possible move? That's not `depth` at all! MCTS requires that you incrementally build a tree. Considering going back to some sources, the [Wikipedia page](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search) is pretty good. — Todd Sewell, Nov 29 '22 at 15:53

Why does this MCTS code miss forced wins?

Code explanation:

0 Answers0