I'm trying to make a simple Monte Carlo Tree Search program (python3) that can take a given board and the player whose turn it is and will return the optimal move for them. I've written one that works somewhat, but while testing it with Tic-Tac-Toe I've found it frequently misses ways it could force wins or prevent losses.
Code explanation:
The game is stored as a class, simply to allow me to implement multiple different games with ease. It includes a moveList(board,player)
function, which simply returns a list of all the possible moves the player could make. playGameRandom(board, player)
simply plays a random game starting from the given position and returns the winner. I have tested all the other code, and I'm relatively certain it all works as intended, so I think the problem is in my MCTS implementation.
def MCTS(board, player, depth):
#get list of moves
ml = game.moveList(board, player)
#store opposing player
otherPlayer = game.otherPlayer(player)
w = [0]*len(ml)
# play (depth) random games from each starting move
for i in range(len(ml)):
for j in range(depth):
winner = playGameRandom(ml[i], otherP)
if winner == player:
#weights for wins vs loss
w[i] = w[i] + 1
elif winner == otherP:
#also weights
w[i] = w[i] - 2
#w now contains scores for each move possible
#find the highest weight
m = max(w)
#just in case there are multiple moves tied for max, pick randomly out of all the moves tied for max
p = []
for i in range(len(w)):
if w[i] == m:
p = p + [i]
#generate the board that corresponds to the best move
bestMove = ml[random.choice(p)]
return bestMove
As I've stated, it works, but not as well as I hoped. It still misses things and makes stupid decisions occasionally, even when depth is very high (~100000). What's the problem with my implementation? Apologies in advance, I'm relatively new to all of this and I think I've bitten off more than I can chew here.