Minimax algorithm in Ruby in Object-Oriented way

Question

I am trying to implement minimax algorithm for a tic tac toe game in an object-oriented way. I was a bit stuck with how to deal with reverting the state for my board object once the algorithm determined the best move. When running the program, I have noticed that minimax method operated on the current board object which is not ideal.

I added a method to undo the move done by the minimax method: board.[]=(empty_square, Square::INITIAL_MARKER)

I have noticed the algorithm makes the wrong choice. Here, X is a player and O is a computer. If this is the state of the board:

     |     |
     |     |   
     |     |
-----+-----+-----
     |     |
     |  X  |   
     |     |
-----+-----+-----
     |     |
     |     |  O
     |     |

When the player X makes a move and picks square 2, minimax (computer, O) will choose 7 instead of 8 which would be a better choice:

     |     |
     |  X  |   
     |     |
-----+-----+-----
     |     |
     |  X  |   
     |     |
-----+-----+-----
     |     |
  O  |     |  O
     |     |

Due to my inexperience, I am a little bit lost on how to proceed and would appreciate any guidance!

Here is the minimax method:

  def minimax
    best_move = 0
    score_current_move = nil
    best_score = -10000 if @current_marker == COMPUTER_MARKER
    best_score = 10000 if @current_marker == HUMAN_MARKER
    board.unmarked_keys.each do |empty_square|
      board.[]=(empty_square, @current_marker)
      if board.full?
        score_current_move = 0
      elsif board.someone_won?
        score_current_move = -1 if board.winning_marker == HUMAN_MARKER
        score_current_move = 1 if board.winning_marker == COMPUTER_MARKER
      else
        alternate_player
        score_current_move = minimax[0]
      end
      if ((@current_marker == COMPUTER_MARKER) && (score_current_move >= best_score))
        best_score = score_current_move
        best_move = empty_square
      elsif ((@current_marker == HUMAN_MARKER) && (score_current_move <= best_score))
        best_score = score_current_move
        best_move = empty_square
      end
      board.[]=(empty_square, Square::INITIAL_MARKER)
    end
    [best_score, best_move]
  end

Could you condense your problem a bit so that it is not required to read and understand 350 lines of code? Please try to reduce it to a minimal example showing your problem. Then describe exactly what you want to achieve, what exactly does not work with your current code (do you see incorrect behavior, exceptions / errors, anything else?). Please be as specific as possible and include any error messages and example code to specifically show your problem. See https://stackoverflow.com/help/minimal-reproducible-example for some guidelines. You can edit your question with the edit link below it. — Holger Just, Mar 07 '21 at 17:47
You might want to look into defining a `initialize_copy` method for your clas(ses), which shoud alter the behaviour of `dup`. — steenslag, Mar 07 '21 at 20:22

Cary Swoveland · Answer 1 · 2021-03-19T17:27:24.577

I see no particular advantage here to defining any classes at all. There is only one board and only two players (the machine and the human) who operate quite differently.

Main method

Next I will write the main method, which depends on several helper methods, all of which could be private.

def play_game(human_moves_first = true)
  raise ArgumentError unless [true, false].include?(human_moves_first)
  human_marker, machine_marker = 
    human_moves_first ? ['X', 'O'] : ['O', 'X']
  board = Array.new(9)

  if human_moves_first
    display(board)
    human_to_move(board, 'X')
  end

  loop do
    display(board)
    play = machine_best_play(board, machine_marker)   
    board[play] = machine_marker
    display(board)
    if win?(board)
      puts "Computer wins"
      break
    end
    if tie?(board)
      puts "Tie game"
      break
    end
    human_to_move(board, human_marker)
    if tie?(board)
      puts "Tie game"
      break
    end
  end
end

As you see I have provided a choice of who starts, the machine or the human.

Initially, board is an array of 9 nils.

The method simply loops until a determination is made as to whether the machine wins or there is a tie. As we know, the machine, acting logically, cannot lose. In each pass of the loop the machine makes a mark. If that results in a win or a tie the game is over; else the human is called upon to make a mark.

Before considering the method machine_best_play, let's consider a few simple helper method that are needed.

Simple helper methods

I will demonstrate these methods with board defined as follows:

board = ['X', 'O', 'X',
         nil, 'O', nil,
         nil, nil, 'X']

Note that while the human refers to the nine locations as 1 through 9, internally they they are represented as indices of board, 0 through 8.

Determine unmarked cells

def unmarked_cells(board)
  board.each_index.select { |i| board[i].nil? }
end

unmarked_cells(board)
  #=> [3, 5, 6, 7]

Ask human to make a selection

def human_to_move(board, marker)
  loop do
    puts "Please mark '#{marker}' in an unmarked cell"
    cell = gets.chomp
    if (n = Integer(cell, exception: false)) && n.between?(1, 9)
      n -= 1 # convert to index in board
      if board[n].nil?
        board[n] = marker
        break
      else
        puts "That cell is occupied"
      end
    else
      puts "That is not a number between 1 and 9"
    end
  end
end

human_to_move(board, 'O')  
Please mark an 'O' in an unmarked cell

If cell = gets.chomp #=> "6" then

board
  #=> ["X", "O", "X", nil, "O", "O", nil, nil, "X"]

For the following I have set board to its original value above.

Display the board

def display(board)
  board.each_slice(3).with_index do |row, idx|
    puts   "     |     |"
    puts "  #{row.map { |obj| obj || ' ' }.join('  |  ')}"
    puts "     |     |"
    puts "-----+-----+-----" unless idx == 2
  end
end

display(board)
     |     |
  X  |  O  |  X
     |     |
-----+-----+-----
     |     |
     |  O  |   
     |     |
-----+-----+-----
     |     |
     |     |  X
     |     |

Determine if the last move (by the machine or human) wins

WINNING_CELL_COMBOS = [
  [0,1,2], [3,4,5], [6,7,8], [0,3,6], [1,4,7], [2,5,8], [0,4,8], [2,4,6]
]

def win?(board)
  WINNING_CELL_COMBOS.any? do |arr|
    (f = arr.first) != nil && arr == [f,f,f]
  end
end

win? board
  #=> false

win? ['X', nil, 'O', 'nil', 'X', 'O', nil, nil, 'X']
  #=> true

win? ['X', nil, 'O', 'nil', 'X', 'O', 'X', nil, 'O']
  #=> true

Determine if game ends in a tie

def tie?(board)
  unmarked_cells(board).empty?
end

tie?(board)
  #=> false

tie? ['X', 'X', 'O', 'O', 'X', 'X', 'X',  'O', 'O']
  #=> true

Note unmarked_cells.empty? can be replaced with board.all?.

Determine machine's best play using minimax algorithm

MACHINE_WINS = 0
TIE = 1
MACHINE_LOSES = 2
NEXT_MARKER = { "X"=>"O", "O"=>"X" }

def machine_best_play(board, marker)
  plays = open_cells(board)
  plays.min_by |play|
    board_after_play = board.dup.tap { |a| a[play] = marker }
    if machine_wins?(board_after_play, marker)
      MACHINE_WIN
    elsif plays.size == 1
      TIE
    else
      human_worst_outcome(board_after_play, NEXT_MARKER[marker]) 
    end
  end
end

This requires two more methods.

Determine machine's best worst outcome for current state of board

def machine_worst_outcome(board, marker)
  plays = open_cells(board)
  plays.map |play|
    board_after_play = board.dup.tap { |a| a[play] = marker }
    if win?(board_after_play)
      MACHINE_WINS
    elsif plays.size == 1
      TIE
    else
      human_worst_outcome(board_after_play, NEXT_MARKER[marker]) 
    end
  end.min
end

Determine human's best worst outcome for current state of board assuming human also plays a minimax strategy

def human_worst_outcome(board, marker)
  plays = open_cells(board)
  plays.map |play|
    board_after_play = board.dup.tap { |a| a[play] = marker }
    if win?(board_after_play)
      MACHINE_LOSES
    elsif plays.size == 1
      TIE
    else
      machine_worst_outcome(board_after_play, NEXT_MARKER[marker])
    end
  end.max
end

Notice that the human maximizes the worst outcome from the machine's perspective whereas the machine minimizes its worst outcome.

Almost there

All that remains is to quash any bugs that are present. Being short of time at the moment I will leave that to you, should you wish to do so. Feel free to edit my answer to make any corrections.

thank you for your very detailed answer! The challenge with OOP is that it is not very straightforward to duplicate the board states. I managed to implement the algorithm in a functional way, but not OOP way. Just curious how it can be done in OOP. Thank you! I already improved my algorithm and implemented a method to undo the move. Now I see that my algorithm just chooses the best move, but not the best worst move. — itiswhatitis, Mar 19 '21 at 15:56

Minimax algorithm in Ruby in Object-Oriented way

1 Answers1