0

I've a homework assignment that uses MCTS (http://mcts.ai/code/python.html) to play as many games of tic tac toe as required using MCTS. The goal of the assignment is to train a decision tree classifier that can predict what the best action is to take depending on the current state of the game and the player playing the game. The data marks a 1.0 or 2.0 or 0 depending on which player has marked his chosen position in the tic tac toe grid (0 for no players). Ive so far managed to save to CSV the data in the format like this:

Unnamed: 0 player 0 1 2 ... 6 7 8 best_move won

0 0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 4 0

My first and main question is how can i make a decision tree classifier using scikit-learn that incorporates all equal states i.e. the root should have nine decisions available to the first player, then eight for the second player and so on alternating between players (1.0 for player 1, 2.0 for player 2). The second and inter-related question is how can i represent repeating data in a 0-8 (9) interval over and over again so that after the 9th interval has been read it will start over again from the root with the next game. It of course would be preferable to group together sub states that are the same for player 1 or player 2.

here is the pdf view of the tree generated by my code. Below is the code that i use to train the decision tree.

def visualise_tree(trained_tree):
    dot_data = tree.export_graphviz(trained_tree,out_file=None)
    graph = graphviz.Source(dot_data)
    graph.render("oxo")

def trainTree(read_csv):
    clf = tree.DecisionTreeClassifier()
    slice_training_data = read_csv[["player","0", "1", "2", "3", "4", "5", "6", "7", "8"]]
    slice_prediction_data = read_csv[["best_move"]]
    clf.fit(slice_training_data,slice_prediction_data)
    visualise_tree(clf)
    print(read_csv)

if __name__ == "__main__":
    """ Play a single game to the end using UCT for both players. 
    """
    #df = pd.DataFrame(columns=["player", "0", "1", "2", "3", "4", "5", "6", "7", "8", "best_move","won"])
    #for i in range(1):
    #    df = UCTPlayGame(df)
    read_csv = pd.read_csv('10000games.csv')
    trainTree(read_csv)
    #df = df[["player", "0", "1", "2", "3", "4", "5", "6", "7", "8", "best_move","won"]]
    #print(df)
    #df.to_csv('10000games.csv')

Here is the format of the data:

   ,player,0,1,2,3,4,5,6,7,8,best_move,won
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4,0
1,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0
2,1.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1,0
3,2.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,7,0
4,1.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,2.0,0.0,3,0
5,2.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,5,0
6,1.0,2.0,1.0,0.0,1.0,1.0,2.0,0.0,2.0,0.0,2,0
7,2.0,2.0,1.0,1.0,1.0,1.0,2.0,0.0,2.0,0.0,6,0
8,1.0,2.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,0.0,8,0
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0

as you can see 9 moves are made and then the dataset repeats itself for a new game (starting with 0). The data cycles between 1.0 and 2.0 for each player as each player takes it in turns to move. I additionally to the requirements added a won column for a set of moves that win the game (but unsure how to use this so i didn't include it in the prediction data). The decision tree should ideally merge all starting game states as described and predict what the best move should be.

plgent
  • 109
  • 2
  • 5
  • https://pastebin.com/zZ1yCecx for a sample of the data – plgent Feb 17 '19 at 17:08
  • Could you show us what you've done so far please? And further, please don't include full code or links to pastebins - rather provide a minimum working (or not-working) example for us :) – Nico Albers Feb 17 '19 at 18:44
  • the code is too verbose to post - i can pastebin it but that's about it – plgent Feb 17 '19 at 19:31
  • Okay, but the data could be compressed down to some minimum which is desired for understanding your problem, isn't it? Further, could you please show us a little bit of effort you made on your own - did you even try to build some decision tree based upon the documentation of scikit-learn? – Nico Albers Feb 18 '19 at 06:41
  • updated it to make it clearer – plgent Feb 18 '19 at 11:31

0 Answers0