I've a homework assignment that uses MCTS (http://mcts.ai/code/python.html) to play as many games of tic tac toe as required using MCTS. The goal of the assignment is to train a decision tree classifier that can predict what the best action is to take depending on the current state of the game and the player playing the game. The data marks a 1.0 or 2.0 or 0 depending on which player has marked his chosen position in the tic tac toe grid (0 for no players). Ive so far managed to save to CSV the data in the format like this:
Unnamed: 0 player 0 1 2 ... 6 7 8 best_move won
0 0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 4 0
My first and main question is how can i make a decision tree classifier using scikit-learn that incorporates all equal states i.e. the root should have nine decisions available to the first player, then eight for the second player and so on alternating between players (1.0 for player 1, 2.0 for player 2). The second and inter-related question is how can i represent repeating data in a 0-8 (9) interval over and over again so that after the 9th interval has been read it will start over again from the root with the next game. It of course would be preferable to group together sub states that are the same for player 1 or player 2.
here is the pdf view of the tree generated by my code. Below is the code that i use to train the decision tree.
def visualise_tree(trained_tree):
dot_data = tree.export_graphviz(trained_tree,out_file=None)
graph = graphviz.Source(dot_data)
graph.render("oxo")
def trainTree(read_csv):
clf = tree.DecisionTreeClassifier()
slice_training_data = read_csv[["player","0", "1", "2", "3", "4", "5", "6", "7", "8"]]
slice_prediction_data = read_csv[["best_move"]]
clf.fit(slice_training_data,slice_prediction_data)
visualise_tree(clf)
print(read_csv)
if __name__ == "__main__":
""" Play a single game to the end using UCT for both players.
"""
#df = pd.DataFrame(columns=["player", "0", "1", "2", "3", "4", "5", "6", "7", "8", "best_move","won"])
#for i in range(1):
# df = UCTPlayGame(df)
read_csv = pd.read_csv('10000games.csv')
trainTree(read_csv)
#df = df[["player", "0", "1", "2", "3", "4", "5", "6", "7", "8", "best_move","won"]]
#print(df)
#df.to_csv('10000games.csv')
Here is the format of the data:
,player,0,1,2,3,4,5,6,7,8,best_move,won
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4,0
1,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0
2,1.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1,0
3,2.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,7,0
4,1.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,2.0,0.0,3,0
5,2.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,5,0
6,1.0,2.0,1.0,0.0,1.0,1.0,2.0,0.0,2.0,0.0,2,0
7,2.0,2.0,1.0,1.0,1.0,1.0,2.0,0.0,2.0,0.0,6,0
8,1.0,2.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,0.0,8,0
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
as you can see 9 moves are made and then the dataset repeats itself for a new game (starting with 0). The data cycles between 1.0 and 2.0 for each player as each player takes it in turns to move. I additionally to the requirements added a won column for a set of moves that win the game (but unsure how to use this so i didn't include it in the prediction data). The decision tree should ideally merge all starting game states as described and predict what the best move should be.