---------------
Actual Question
---------------
Ok, the real problem is not with alpha-beta pruning vs minimax algorithms. The problem is that minimax algorithm when in a tree will give only the best solutions whereas alpha-beta will give the correct value, but multiple children have that best value and some of these children shouldn't have this value.
I guess the ultimate question is, what is the most efficient way to get the best (could be multiple in the case of a tie) child of the root node.
The algorithm produces the correct value, but multiple nodes tie with that value, even though some of the moves are obviously wrong.
Example: TickTackToe
-|-|O
-|X|-
-|X|-
will produce the values as: (0,1) and (1,0) with value of -0.06 with my heuristic
(0,1) is the correct value as it will block my X's but (0,1) is wrong as then next move i can put an X at (0,1) and win.
When i run the same algorithm without the
if(beta<=alpha)
break;
It only returns the (0,1) with value -0.06
---------------
Originally posted question, now just sugar
---------------
I've spent days trying to figure out why my min max algorithm works, but when i add alpha beta pruning to it, it doesn't work. I understand they should give the same results and I even made a quick test of that. My question, is why doesn't my implementation produce the same results?
This is a tic tak toe implementation in android. I can beat the algorithm sometimes when
if(beta<=alpha) break;
is not commented out, but when it is commented out it is undefeatable.
private static double minimax(Node<Integer,Integer> parent, int player, final int[][] board, double alpha, double beta, int depth) {
List<Pair<Integer, Integer>> moves = getAvailableMoves(board);
int bs = getBoardScore(board);
if (moves.isEmpty() || Math.abs(bs) == board.length)//leaf node
return bs+(player==X?-1:1)*depth/10.;
double bestVal = player == X ? -Integer.MAX_VALUE : Integer.MAX_VALUE;
for(Pair<Integer, Integer> s : moves){
int[][] b = clone(board);
b[s.getFirst()][s.getSecond()]=player;
Node<Integer, Integer> n = new Node<>(bs,b.hashCode());
parent.getChildren().add(n);
n.setParent(parent);
double score = minimax(n,player==O?X:O,b,alpha,beta, depth+1);
n.getValues().put("score",score);
n.getValues().put("pair",s);
if(player == X) {
bestVal = Math.max(bestVal, score);
alpha = Math.max(alpha,bestVal);
} else {
bestVal = Math.min(bestVal, score);
beta = Math.min(beta,bestVal);
}
/*
If i comment these two lines out it works as expected
if(beta<= alpha)
break;
*/
}
return bestVal;
}
Now this wouldn't be a problem for tick tack toe due to the small search tree, but i then developed it for checkers and noticed the same phenomenon.
private double alphaBeta(BitCheckers checkers, int depth, int absDepth, double alpha, double beta){
if(checkers.movesWithoutAnything >= 40)
return 0;//tie game//needs testing
if(depth == 0 || checkers.getVictoryState() != INVALID)
return checkers.getVictoryState()==INVALID?checkers.getBoardScore()-checkers.getPlayer()*moves/100.:
checkers.getPlayer() == checkers.getVictoryState() ? Double.MAX_VALUE*checkers.getPlayer():
-Double.MAX_VALUE*checkers.getPlayer();
List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>> moves;
if(absDepth == maxDepth)
moves = (List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>>) node.getValues().get("moves");
else
moves = checkers.getAllPlayerMoves();
if(moves.isEmpty()) //no moves left? then this player loses
return checkers.getPlayer() * -Double.MAX_VALUE;
double v = checkers.getPlayer() == WHITE ? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
for(Pair<Pair<Integer, Integer>, Pair<Integer, Integer>> i : moves){
BitCheckers c = checkers.clone();
c.movePiece(i.getFirst().getFirst(),i.getFirst().getSecond(),i.getSecond().getFirst(),i.getSecond().getSecond());
int newDepth = c.getPlayer() == checkers.getPlayer() ? depth : depth - 1;
if(checkers.getPlayer() == WHITE) {
v = Math.max(v, alphaBeta(c, newDepth, absDepth - 1, alpha, beta));
alpha = Math.max(alpha,v);
}else {
v = Math.min(v, alphaBeta(c, newDepth, absDepth - 1, alpha, beta));
beta = Math.min(beta,v);
}
if(absDepth == maxDepth) {
double finalScore = v;
for(Node n : node.getChildren())
if(n.getData().equals(i)){
n.setValue(finalScore);
break;
}
}
/*
If i comment these two lines out it works as expected
if(beta<= alpha)
break;
*/
}
return v;
}
I tested it with pvs and it gives the same results as alpha-beta pruning, ie not nearly as good as just minimax.
public double pvs(BitCheckers checkers, int depth, int absDepth, double alpha, double beta){
if(checkers.movesWithoutAnything >= 40)
return 0;//tie game//needs testing
if(depth == 0 || checkers.getVictoryState() != INVALID)
return checkers.getVictoryState()==INVALID?checkers.getBoardScore()-checkers.getPlayer()*moves/100.:
checkers.getPlayer() == checkers.getVictoryState() ? Double.MAX_VALUE*checkers.getPlayer():
-Double.MAX_VALUE*checkers.getPlayer();
List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>> moves;
if(absDepth == maxDepth)
moves = (List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>>) node.getValues().get("moves");
else
moves = checkers.getAllPlayerMoves();
if(moves.isEmpty()) //no moves left? then this player loses
return checkers.getPlayer() * -Double.MAX_VALUE;
int j = 0;
double score;
for(Pair<Pair<Integer, Integer>, Pair<Integer, Integer>> i : moves){
BitCheckers c = checkers.clone();
c.movePiece(i.getFirst().getFirst(),i.getFirst().getSecond(),i.getSecond().getFirst(),i.getSecond().getSecond());
int newDepth = c.getPlayer() == checkers.getPlayer() ? depth : depth - 1;
double sign = c.getPlayer() == checkers.getPlayer()? -1 : 1;
if(j++==0)
score = -pvs(c,newDepth,absDepth-1,sign*-beta,sign*-alpha);
else {
score = -pvs(c,newDepth, absDepth-1,sign*-(alpha+1),sign*-alpha);
if(alpha<score || score<beta)
score = -pvs(c,newDepth,absDepth-1,sign*-beta,sign*-score);
}
if(absDepth == maxDepth) {
double finalScore = score;
for(Node n : node.getChildren())
if(n.getData().equals(i)){
n.setValue(finalScore);
break;
}
}
alpha = Math.max(alpha,score);
if(alpha>=beta)
break;
}
return alpha;
}
Checkers without alpha beta pruning is good, but not great. I know with a working version of alpha-beta it could be really great. Please help fix my alpha-beta pruning.
I understand it should give the same result, my question is why is my implementation not giving the same results?
To confirm that it should give the same results, i made a quick test class implementation.
public class MinimaxAlphaBetaTest {
public static void main(String[] args) {
Node<Double,Double> parent = new Node<>(0.,0.);
int depth = 10;
createTree(parent,depth);
Timer t = new Timer().start();
double ab = alphabeta(parent,depth+1,Double.NEGATIVE_INFINITY,Double.POSITIVE_INFINITY,true);
t.stop();
System.out.println("Alpha Beta: "+ab+", time: "+t.getTime());
t = new Timer().start();
double mm = minimax(parent,depth+1,true);
t.stop();
System.out.println("Minimax: "+mm+", time: "+t.getTime());
t = new Timer().start();
double pv = pvs(parent,depth+1,Double.NEGATIVE_INFINITY,Double.POSITIVE_INFINITY,1);
t.stop();
System.out.println("PVS: "+pv+", time: "+t.getTime());
if(ab != mm)
System.out.println(ab+"!="+mm);
}
public static void createTree(Node n, int depth){
if(depth == 0) {
n.getChildren().add(new Node<>(0.,(double) randBetween(1, 100)));
return;
}
for (int i = 0; i < randBetween(2,10); i++) {
Node nn = new Node<>(0.,0.);
n.getChildren().add(nn);
createTree(nn,depth-1);
}
}
public static Random r = new Random();
public static int randBetween(int min, int max){
return r.nextInt(max-min+1)+min;
}
public static double pvs(Node<Double,Double> node, int depth, double alpha, double beta, int color){
if(depth == 0 || node.getChildren().isEmpty())
return color*node.getValue();
int i = 0;
double score;
for(Node<Double,Double> child : node.getChildren()){
if(i++==0)
score = -pvs(child,depth-1,-beta,-alpha,-color);
else {
score = -pvs(child,depth-1,-alpha-1,-alpha,-color);
if(alpha<score || score<beta)
score = -pvs(child,depth-1,-beta,-score,-color);
}
alpha = Math.max(alpha,score);
if(alpha>=beta)
break;
}
return alpha;
}
public static double alphabeta(Node<Double,Double> node, int depth, double alpha, double beta, boolean maximizingPlayer){
if(depth == 0 || node.getChildren().isEmpty())
return node.getValue();
double v = maximizingPlayer ? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
for(Node<Double,Double> child : node.getChildren()){
if(maximizingPlayer) {
v = Math.max(v, alphabeta(child, depth - 1, alpha, beta, false));
alpha = Math.max(alpha, v);
}else {
v = Math.min(v,alphabeta(child,depth-1,alpha,beta,true));
beta = Math.min(beta,v);
}
if(beta <= alpha)
break;
}
return v;
}
public static double minimax(Node<Double,Double> node, int depth, boolean maximizingPlayer){
if(depth == 0 || node.getChildren().isEmpty())
return node.getValue();
double v = maximizingPlayer ? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
for(Node<Double,Double> child : node.getChildren()){
if(maximizingPlayer)
v = Math.max(v,minimax(child,depth-1,false));
else
v = Math.min(v,minimax(child,depth-1,true));
}
return v;
}
}
This does in fact give what i expected alpha-beta and pvs are about the same speed (pvs is slower because the children are in random order) and produce the same results as minimax. This proves that the algorithms are correct, but for whatever reason, my implementation of them are wrong.
Alpha Beta: 28.0, time: 25.863126 milli seconds
Minimax: 28.0, time: 512.6119160000001 milli seconds
PVS: 28.0, time: 93.357653 milli seconds
Source Code for Checkers implementation