0

I am trying to implement q-learning for tictactoe. One of the steps in doing so involves enumerating all the possible states of the tictactoe board to form a state-value table. I have written a procedure to recursively generate all possible states starting from the empty board. To do this, I am performing implicitly the pre-order traversal of the search space tree. However, at the end of it all, I am getting only 707 unique states whereas the general consensus is that the number of legal states is around 5000.

Note: I am referring to the number of legal states. I am aware that the number of states is closer to 19,000 if either player is allowed to continue playing after a game is completed (what I mean by an illegal state).

CODE:

def generate_state_value_table(self, state, turn):

    winner = int(is_game_over(state)) #check if, for the current turn and state, game has finished and if so who won
    #print "\nWinner is ", winner
    #print "\nBoard at turn: ", turn
    #print_board(state) 
    self.add_state(state, winner/2 + 0.5) #add the current state with the appropriate value to the state table
    open_cells = open_spots(state) #find the index (from 0 to total no. of cells) of all the empty cells in the board 

    #check if there are any empty cells in the board
    if len(open_cells) > 0:
        for cell in open_cells:
            #pdb.set_trace()
            row, col = cell / len(state), cell % len(state)
            new_state = deepcopy(state) #make a copy of the current state 

            #check which player's turn it is 
            if turn % 2 == 0:
                new_state[row][col] = 1
            else:
                new_state[row][col] = -1        

            #using a try block because recursive depth may be exceeded
            try:
                #check if the new state has not been generated somewhere else in the search tree
                if not self.check_duplicates(new_state):
                    self.generate_state_value_table(new_state, turn+1)
                else:
                    return
            except:
                #print "Recursive depth exceeded"
                exit()

    else:
        return

You can look at the full code here if you want.

EDIT: I tidied the code up a bit both in the link and here with more comments to make things clearer. Hope that helps.

SpiderWasp42
  • 2,526
  • 1
  • 12
  • 17

1 Answers1

0

So I finally solved the issue and I am putting up this answer for anyone who faces a similar issue. The bug was in the way I was handling duplicate states. If the new state generated was generated before somewhere else in the search tree, then it should not be added to the state table, but the mistake I made was in cutting short the pre-order traversal on finding a duplicate state when it should have gone one.

Simply put: removing the else clause from the code below gave me the number of states as 6046:

#check if the new state has not been generated somewhere else in the search tree
if not self.check_duplicates(new_state):
    self.generate_state_value_table(new_state, turn+1)
else:
    return

Furthermore, I stopped exploring the search tree branch when I encountered a state where there was a clear winner. Concretely, I added the following code after self.add_state(state, winner/2 + 0.5):

#check if the winner returned is one of the players and go back to the previous state if so
if winner != 0:
    return

This gave me the number of states as 5762 which is what I was looking for.

SpiderWasp42
  • 2,526
  • 1
  • 12
  • 17