Alpha beta pruning not producing good results

Question

---------------

Actual Question

---------------

Ok, the real problem is not with alpha-beta pruning vs minimax algorithms. The problem is that minimax algorithm when in a tree will give only the best solutions whereas alpha-beta will give the correct value, but multiple children have that best value and some of these children shouldn't have this value.

I guess the ultimate question is, what is the most efficient way to get the best (could be multiple in the case of a tie) child of the root node.

The algorithm produces the correct value, but multiple nodes tie with that value, even though some of the moves are obviously wrong.

Example: TickTackToe

-|-|O
-|X|-
-|X|-

will produce the values as: (0,1) and (1,0) with value of -0.06 with my heuristic

(0,1) is the correct value as it will block my X's but (0,1) is wrong as then next move i can put an X at (0,1) and win.

When i run the same algorithm without the

if(beta<=alpha)
    break;

It only returns the (0,1) with value -0.06

---------------

Originally posted question, now just sugar

---------------

I've spent days trying to figure out why my min max algorithm works, but when i add alpha beta pruning to it, it doesn't work. I understand they should give the same results and I even made a quick test of that. My question, is why doesn't my implementation produce the same results?

This is a tic tak toe implementation in android. I can beat the algorithm sometimes when

if(beta<=alpha) break;

is not commented out, but when it is commented out it is undefeatable.

private static double minimax(Node<Integer,Integer> parent, int player, final int[][] board, double alpha, double beta, int depth) {
    List<Pair<Integer, Integer>> moves = getAvailableMoves(board);
    int bs = getBoardScore(board);
    if (moves.isEmpty() || Math.abs(bs) == board.length)//leaf node
        return bs+(player==X?-1:1)*depth/10.;
    double bestVal = player == X ? -Integer.MAX_VALUE : Integer.MAX_VALUE;
    for(Pair<Integer, Integer> s : moves){
        int[][] b = clone(board);
        b[s.getFirst()][s.getSecond()]=player;
        Node<Integer, Integer> n = new Node<>(bs,b.hashCode());
        parent.getChildren().add(n);
        n.setParent(parent);
        double score = minimax(n,player==O?X:O,b,alpha,beta, depth+1);
        n.getValues().put("score",score);
        n.getValues().put("pair",s);
        if(player == X) {
            bestVal = Math.max(bestVal, score);
            alpha = Math.max(alpha,bestVal);
        } else {
            bestVal = Math.min(bestVal, score);
            beta = Math.min(beta,bestVal);
        }
        /*
        If i comment these two lines out it works as expected
        if(beta<= alpha)
            break;
        */
    }
    return bestVal;
}

Now this wouldn't be a problem for tick tack toe due to the small search tree, but i then developed it for checkers and noticed the same phenomenon.

private double alphaBeta(BitCheckers checkers, int depth, int absDepth, double alpha, double beta){
    if(checkers.movesWithoutAnything >= 40)
        return 0;//tie game//needs testing
    if(depth == 0 || checkers.getVictoryState() != INVALID)
        return checkers.getVictoryState()==INVALID?checkers.getBoardScore()-checkers.getPlayer()*moves/100.:
                checkers.getPlayer() == checkers.getVictoryState() ? Double.MAX_VALUE*checkers.getPlayer():
                        -Double.MAX_VALUE*checkers.getPlayer();
    List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>> moves;
    if(absDepth == maxDepth)
        moves = (List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>>) node.getValues().get("moves");
    else
        moves = checkers.getAllPlayerMoves();
    if(moves.isEmpty()) //no moves left? then this player loses
        return checkers.getPlayer() * -Double.MAX_VALUE;
    double v = checkers.getPlayer() == WHITE ? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
    for(Pair<Pair<Integer, Integer>, Pair<Integer, Integer>> i : moves){
        BitCheckers c = checkers.clone();
        c.movePiece(i.getFirst().getFirst(),i.getFirst().getSecond(),i.getSecond().getFirst(),i.getSecond().getSecond());
        int newDepth = c.getPlayer() == checkers.getPlayer() ? depth : depth - 1;
        if(checkers.getPlayer() == WHITE) {
            v = Math.max(v, alphaBeta(c, newDepth, absDepth - 1, alpha, beta));
            alpha = Math.max(alpha,v);
        }else {
            v = Math.min(v, alphaBeta(c, newDepth, absDepth - 1, alpha, beta));
            beta = Math.min(beta,v);
        }
        if(absDepth == maxDepth) {
            double finalScore = v;
            for(Node n : node.getChildren())
                if(n.getData().equals(i)){
                    n.setValue(finalScore);
                    break;
                }
        }
        /*
        If i comment these two lines out it works as expected
        if(beta<= alpha)
            break;
        */
    }
    return v;
}

I tested it with pvs and it gives the same results as alpha-beta pruning, ie not nearly as good as just minimax.

public double pvs(BitCheckers checkers, int depth, int absDepth, double alpha, double beta){
    if(checkers.movesWithoutAnything >= 40)
        return 0;//tie game//needs testing
    if(depth == 0 || checkers.getVictoryState() != INVALID)
        return checkers.getVictoryState()==INVALID?checkers.getBoardScore()-checkers.getPlayer()*moves/100.:
                checkers.getPlayer() == checkers.getVictoryState() ? Double.MAX_VALUE*checkers.getPlayer():
                        -Double.MAX_VALUE*checkers.getPlayer();
    List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>> moves;
    if(absDepth == maxDepth)
        moves = (List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>>) node.getValues().get("moves");
    else
        moves = checkers.getAllPlayerMoves();
    if(moves.isEmpty()) //no moves left? then this player loses
        return checkers.getPlayer() * -Double.MAX_VALUE;
    int j = 0;
    double score;
    for(Pair<Pair<Integer, Integer>, Pair<Integer, Integer>> i : moves){
        BitCheckers c = checkers.clone();
        c.movePiece(i.getFirst().getFirst(),i.getFirst().getSecond(),i.getSecond().getFirst(),i.getSecond().getSecond());
        int newDepth = c.getPlayer() == checkers.getPlayer() ? depth : depth - 1;
        double sign = c.getPlayer() == checkers.getPlayer()? -1 : 1;
        if(j++==0)
            score = -pvs(c,newDepth,absDepth-1,sign*-beta,sign*-alpha);
        else {
            score = -pvs(c,newDepth, absDepth-1,sign*-(alpha+1),sign*-alpha);
            if(alpha<score || score<beta)
                score = -pvs(c,newDepth,absDepth-1,sign*-beta,sign*-score);
        }
        if(absDepth == maxDepth) {
            double finalScore = score;
            for(Node n : node.getChildren())
                if(n.getData().equals(i)){
                    n.setValue(finalScore);
                    break;
                }
        }
        alpha = Math.max(alpha,score);
        if(alpha>=beta)
            break;
    }
    return alpha;
}

Checkers without alpha beta pruning is good, but not great. I know with a working version of alpha-beta it could be really great. Please help fix my alpha-beta pruning.

I understand it should give the same result, my question is why is my implementation not giving the same results?

To confirm that it should give the same results, i made a quick test class implementation.

public class MinimaxAlphaBetaTest {
    public static void main(String[] args) {
        Node<Double,Double> parent = new Node<>(0.,0.);
        int depth = 10;
        createTree(parent,depth);
        Timer t = new Timer().start();
        double ab = alphabeta(parent,depth+1,Double.NEGATIVE_INFINITY,Double.POSITIVE_INFINITY,true);
        t.stop();
        System.out.println("Alpha Beta: "+ab+", time: "+t.getTime());
        t = new Timer().start();
        double mm = minimax(parent,depth+1,true);
        t.stop();
        System.out.println("Minimax: "+mm+", time: "+t.getTime());
        t = new Timer().start();
        double pv = pvs(parent,depth+1,Double.NEGATIVE_INFINITY,Double.POSITIVE_INFINITY,1);
        t.stop();
        System.out.println("PVS: "+pv+", time: "+t.getTime());
        if(ab != mm)
            System.out.println(ab+"!="+mm);
    }

    public static void createTree(Node n, int depth){
        if(depth == 0) {
            n.getChildren().add(new Node<>(0.,(double) randBetween(1, 100)));
            return;
        }
        for (int i = 0; i < randBetween(2,10); i++) {
            Node nn = new Node<>(0.,0.);
            n.getChildren().add(nn);
            createTree(nn,depth-1);
        }
    }

    public static Random r = new Random();
    public static int randBetween(int min, int max){
        return r.nextInt(max-min+1)+min;
    }

    public static double pvs(Node<Double,Double> node, int depth, double alpha, double beta, int color){
        if(depth == 0 || node.getChildren().isEmpty())
            return color*node.getValue();
        int i = 0;
        double score;
        for(Node<Double,Double> child : node.getChildren()){
            if(i++==0)
                score = -pvs(child,depth-1,-beta,-alpha,-color);
            else {
                score = -pvs(child,depth-1,-alpha-1,-alpha,-color);
                if(alpha<score || score<beta)
                    score = -pvs(child,depth-1,-beta,-score,-color);
            }
            alpha = Math.max(alpha,score);
            if(alpha>=beta)
                break;
        }
        return alpha;
    }

    public static double alphabeta(Node<Double,Double> node, int depth, double alpha, double beta, boolean maximizingPlayer){
        if(depth == 0 || node.getChildren().isEmpty())
            return node.getValue();
        double v = maximizingPlayer ? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
        for(Node<Double,Double> child : node.getChildren()){
            if(maximizingPlayer) {
                v = Math.max(v, alphabeta(child, depth - 1, alpha, beta, false));
                alpha = Math.max(alpha, v);
            }else {
                v = Math.min(v,alphabeta(child,depth-1,alpha,beta,true));
                beta = Math.min(beta,v);
            }
            if(beta <= alpha)
                break;
        }
        return v;
    }

    public static double minimax(Node<Double,Double> node, int depth, boolean maximizingPlayer){
        if(depth == 0 || node.getChildren().isEmpty())
            return node.getValue();
        double v = maximizingPlayer ? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
        for(Node<Double,Double> child : node.getChildren()){
            if(maximizingPlayer)
                v = Math.max(v,minimax(child,depth-1,false));
            else
                v = Math.min(v,minimax(child,depth-1,true));
        }
        return v;
    }
}

This does in fact give what i expected alpha-beta and pvs are about the same speed (pvs is slower because the children are in random order) and produce the same results as minimax. This proves that the algorithms are correct, but for whatever reason, my implementation of them are wrong.

Alpha Beta: 28.0, time: 25.863126 milli seconds
Minimax: 28.0, time: 512.6119160000001 milli seconds
PVS: 28.0, time: 93.357653 milli seconds

Source Code for Checkers implementation

Pseudocode for pvs

Pseudocode for alpha beta i'm following

Full Souce Code for the Tick Tack Toe Implementation

I'm pretty sure my error is in how i get the best value. I'm not sure yet how to fix it, or what exactly is the issue, but i'm still working to figure it out. — unsupo, Feb 08 '18 at 23:04
I'm having the same problem as you concerning bad nodes tying the optimal (correct) value. Did you manage to solve the problem ? — , Mar 07 '19 at 21:47

score 1 · Answer 1 · answered Feb 08 '18 at 01:58

1

I think you might be misunderstanding AB pruning.

AB pruning should give you the same results as MinMax, it's just a way of not going down certain branches because you know making that move would have been worse than another move that you examined which helps when you have massive trees.

Also, MinMax without using a heuristic and cutting off your search will always be undefeatable because you've computed every possible path to reach every terminating state. So I would've expected that AB pruning and MinMax would both be unbeatable so I think something is wrong with your AB pruning. If your minmax is undefeatable, so should your approach using AB pruning.

answered Feb 08 '18 at 01:58

Benjamin Liu

313
1
8

I'm sure there is something wrong with my AB pruning, but i'm struggling to figure out what that is. Can you please explain what is wrong with my implementation? – unsupo Feb 08 '18 at 17:23
This doesn't answer my question. I realize AB pruning should give the same results as MinMax and that is just cuts branches that are worse. I understand that reaching every possible path to the terminating state would be undefeatable. FYI, i do use a heuristic (especially in the checkers implementation), it might not be the best heuristic, but it's still a heuristic. My real question is what is wrong with my approach? – unsupo Feb 08 '18 at 20:43

Alpha beta pruning not producing good results

---------------

Actual Question

---------------

---------------

Originally posted question, now just sugar

---------------

1 Answers1