How to call a minimax method(with alpha beta pruning) properly

Question

This is my minimax method which implements alpha beta pruning and memoization:

public int[] newminimax499(int a, int b){
    int bestPos=-1;
    int alpha= a;
    int beta= b;
    int currentScore;
    //boardShow();
    String stateString = "";                                                
    for (int i=0; i<state.length; i++) 
        stateString += state[i];                        
    int[] oldAnswer = oldAnswers.get(stateString);                          
    if (oldAnswer != null) 
        return oldAnswer;
    if(isGameOver2()!='N'){
        int[] answer = {score(), bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
    else{
        for(int x:getAvailableMoves()){
            if(turn=='O'){  //O is maximizer
                setO(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                //revert(x);
                if(currentScore>alpha){
                    alpha=currentScore;
                    bestPos=x;
                }
                /*if(alpha>=beta){
                    break;
                }*/
            }
            else {  //X is minimizer
                setX(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                //revert(x);
                if(currentScore<beta){
                    beta=currentScore;
                    bestPos=x;
                }
                /*if(alpha>=beta)
                    break;*/
            }
            revert(x);
            if(alpha>=beta)
                break;
        }
    }
    if(turn=='O'){ 
        int[] answer = {alpha, bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
    else {
        int[] answer = {beta, bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
}

As a test game, in my main method I place an X somewhere(X is the player), and then call newminimax499 to see where I should place O(the computer):

 public static void main(String[] args) {
    State3 s=new State3(3);
    int [] result=new int[2];
    s.setX(4);
    result=s.newminimax499(Integer.MIN_VALUE, Integer.MAX_VALUE);
    System.out.println("Score: "+result[0]+" Position: "+ result[1]);
    System.out.println("Run time: " + (endTime-startTime));
    s.boardShow();
}

}

The method returns the position where the computer should play it's O(in this scenario it's 6), so I place O as instructed, play an X for myself, call newminimax499 and run the code again to see where O wants to play and so on and so forth.

public static void main(String[] args) {
    State3 s=new State3(3);
    int [] result=new int[2];
    s.setX(4);
    s.setO(6);//Position returned from previous code run
    s.setX(2);
    s.setO(8);//Position returned from previous code run
    s.setX(3);
    result=s.newminimax499(Integer.MIN_VALUE, Integer.MAX_VALUE);
    System.out.println("Score: "+result[0]+" Position: "+ result[1]);
    System.out.println("Run time: " + (endTime-startTime));
    s.boardShow();
}

After this particular run I get the result

Score: 10 Position: 7

Which is good. However, in my GUI this isn't how newminimax gets called. Over there the board doesn't get reset every time a new X or O is placed. If I were to put it in a main method like in the previous examples it would look something like this(keep in mind that it's the exact same sequence of input):

public static void main(String[] args) {
    State3 s=new State3(3);
    int [] result=new int[2];
    s.setX(4); //Player makes his move
    result=s.newminimax499(Integer.MIN_VALUE, Integer.MAX_VALUE);//Where should pc play?
    s.setO(result[1]);//PC makes his move
    s.setX(2);//Player makes his move
    result=s.newminimax499(Integer.MIN_VALUE, Integer.MAX_VALUE);//Where should PC make his move?
    s.setO(result[1]);//PC makes his move
    s.setX(3);//Player makes his move
    result=s.newminimax499(Integer.MIN_VALUE, Integer.MAX_VALUE);
    System.out.println("Score: "+result[0]+" Position: "+ result[1]);
    System.out.println("Run time: " + (endTime-startTime));
    s.boardShow();
}

Now, when the method is called this way(which is how it's called in the GUI) it returns:

Score: 0 Position: 5

Which means that instead of taking the winning move, it blocked the opponent. After playing a few games this way it became clear that the PC actually loses. So why is it that these 2 ways of calling newminimax499 return different results?

This is how it looks on the GUI:

Note: All methods needed to run the program can be found in this post.

score 2 · Answer 1 · answered Aug 29 '15 at 07:13

2

The problem you have encountered here is the same as in chess with transposition tables and alpha beta. I have gotta contradict you in the point that they are incompatible!

As I suggested multiple times before, please read the corresponding chessprogramming wiki articles before you try to implement something!

In order to make memo and AB work together, you have to save a flag for every postion in your memo table that differentiates between alpha-cut-nodes, beta-cut-nodes and precise nodes.

And believe me, I know from experience that they work together ;)

answered Aug 29 '15 at 07:13

xXliolauXx

1,273
1
11
25

I have looked through the chessprogramming wiki, but couldn't really find anything that was really directly relevant to my TicTacToe problems. But that may have been just me, since I didn't search that thoroughly. In any case thanks for the heads up. So it was just the way I implemented it that made them incompatible. – Omar Sharaki Aug 31 '15 at 11:03
@Omar: Here is the link I meant: https://chessprogramming.wikispaces.com/Transposition+Table that's the chess-equivalent to your memoization. In the part "Table Entry Types" you will find the problem I meant. – xXliolauXx Aug 31 '15 at 17:10
So I finally looked through the chess wiki :) Lots of info. However, if I may narrow down all the things I don't get into 2 questions: 1) How is my program supposed to react when it finds a matching node in the table if the node is a) an alpha-cut, b)a beta-cut? And 2) how should I implement the idea of flags in the transposition table representing the different types of cuts? – Omar Sharaki Dec 30 '15 at 18:53
@Omar For Q1: You then can't treat the TT values as "exact", but rather as a minimum respectively maximum Value. So a minimum-value entry could be useful to create a cutoff for player Min. – xXliolauXx Jan 01 '16 at 10:27
@Omar Q2: Either add booleans to every entry or some other attribute you can atore than information in. – xXliolauXx Jan 01 '16 at 10:30

score 0 · Answer 2 · answered Aug 23 '15 at 11:18

After playing around with a bunch of ideas I finally found the answer so might as well post it. The method in question here, newminimax499, is trying to implement both memoization AND alpha beta pruning. For some reason it seems that these 2 utilities are incompatible(or at least my implementation of these 2 utilities makes them incompatible). After removing the parts related to memoization the method becomes a pure alpha beta pruning minimax algorithm, works fine, and looks like this:

public int[] newminimax499(int alpha, int beta){
    int bestPos=-1;
    int currentScore;
    if(isGameOver2()!='N'){
        int[] answer = {score(), bestPos};                                    
        return answer;
    }
    else{
        for(int x:getAvailableMoves()){
            if(turn=='O'){  //O is maximizer
                setO(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                if(currentScore>alpha){
                    alpha=currentScore;
                    bestPos=x;
                }
            }
            else {  //X is minimizer
                setX(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                if(currentScore<beta){
                    beta=currentScore;
                    bestPos=x;
                }
            }
            revert(x);
            if(alpha>=beta)
                break;
        }
        if(turn=='O'){ 
            int[] answer = {alpha, bestPos};                                    
            return answer;
        }
        else {
            int[] answer = {beta, bestPos};                                    
            return answer;
        }
    }
}

Not only does this method now work(however you call in the main method), but it's also much faster than a minimax with memoization. This method calculates the 2nd move in a 4x4 game in a mere 7 seconds. Whereas a minimax which implements memoization calculates it in about 23 seconds.

How to call a minimax method(with alpha beta pruning) properly

2 Answers2

Linked