Debugging Recursive MinMax in TicTacToe

Question

I'm trying to get the minmax algorithm (computer AI) to work in my game of tic-tac-toe. I've been stuck on this for days. Essentially, I don't understand why the computer AI simply places it's marker ("O") in sequential order from board pieces 0-8.

For example, as the human player, if I choose 1, then the computer will choose 0:

 O| X| 2
--+---+--
 3| 4| 5
--+---+--
 6| 7| 8

Next, if I choose 4, then the computer will choose 2:

 O| X| O
--+---+--
 3| X| 5
--+---+--
 6| 7| 8

And so on:

 O| X| O
--+---+--
 O| X| O
--+---+--
 X| 7| X

I've debugged the minmax algorithm as much as I can, but it's getting really hard to follow what's going on.

Here's the ComputerPlayer class with the algorithm (and without all my print statements). The minmax method is where I'm having a lot of trouble. (I'm not 100% sure on using worst_score or even the associated logic.)

class ComputerPlayer < Player
  def move(game_board)
    minmax(game_board) #minmax to create @best_move

    game_board.place_piece(@best_move, marker)
  end

  def minmax(board, player_tracker = 0) 
    if board.game_over?
      return score(board)
    else
      worst_score  = (1.0/0.0) #Infinity
      best_score  = -(1.0/0.0) #-Infinity
      @best_move  = board.get_available_positions.first

      new_marker = player_tracker.even? ? 'O' : 'X'
      player_tracker += 1

      board.get_available_positions.each do |move|
        new_board = board.place_piece(move, new_marker)
        current_score = minmax(new_board, player_tracker)
        if new_marker == marker #if the player is the computer player
          if current_score > best_score
            @best_move = move
            best_score = current_score
          end
        else
          if current_score < worst_score
            worst_score = current_score
          end
        end
      end
    end
    return best_score
  end

  def score(board)
    if board.winner == "O" #'O' == 'O', 'nil' == 'O'
      10
    elsif board.winner == "X" #'X' != 'O', 'nil' != 'O'
      -10
    elsif board.winner == nil
      0
    end
  end
end

score 3 · Accepted Answer · answered Apr 24 '15 at 06:27

The problem is that minmax always returns best_score.

The minmax routine constantly toggles between the two players. When the current player being simulated is the computer player, then the best score is the highest score, when the current player being simulated is the human player, then the best score is the lowest score.

I rewrote the routine to try all remaining moves for an iteration and keep track of the corresponding score in a local hash. When finished, the best score is returned and the best move is set, depending on the currently simulated player.

def minmax(board, player_tracker = 0, iteration = 0) #minmax
    if board.game_over?
        return score(board, iteration)
    end

    new_marker = player_tracker.even? ? 'O' : 'X'

    scores = {}
    board.get_available_positions.each do |move|
        new_board = board.place_piece(move, new_marker)
        scores[move] = minmax(new_board, player_tracker + 1, iteration + 1)
    end

    if player_tracker.even?
        @best_move = scores.sort_by {|_key, value| value}.reverse.to_h.keys[0]
    else
        @best_move = scores.sort_by {|_key, value| value}.to_h.keys[0]
    end

    return scores[@best_move]
end

To even increase accuracy, I rewrote the score routine to also consider the iterations needed to create the board to score. Being able to win in 1 iteration should be preferred over winning in 3 iterations, right?

def score(board, iteration)
    # "O", "X", "nil"
    if board.winner == "O" #'O' == 'O', 'nil' == 'O'
      10.0 / iteration
    elsif board.winner == "X" #'X' != 'O', 'nil' != 'O'
      -10.0 / iteration
    elsif board.winner == nil
      0
    else
      raise "ERROR"
    end
end

With these 2 routines replaces, the steps taken by the computer seem much more logical.

So the scores hash maintains all the scores for BOTH players? — funfuntime, Apr 27 '15 at 06:05
Scores is a local variable which will keep track of all scores within an iteration. So for each iteration, it will keep track of all scores for a single player. — Philip Bijker, Apr 28 '15 at 05:13

Debugging Recursive MinMax in TicTacToe

1 Answers1