1

I am currently writing an AI for the board game Hex. I want to use Monte-Carlo-Tree-Search to do so and have already tried to implement it. However, the AI makes incredible stupid (random) moves and I can not figure out why it`s not working.

import java.util.ArrayList;
import java.util.Random;

/**
 * Created by Robin on 18.03.2017.
 */
public class TreeNode {


    private static final Random random = new Random();
    private static final double epsion=10e-5;
    protected double nvisits;
    protected double totValue;
    protected int move=-1;

    private HexBoard board;
    protected ArrayList<TreeNode>children ;



    public TreeNode(HexBoard board){
        this.board =board;
    }


    //Copy-Constructor
    public TreeNode(TreeNode treeNode){
        this.nvisits=treeNode.nvisits;
        this.totValue=treeNode.totValue;
        this.move=treeNode.move;
        this.board = new HexBoard(treeNode.board);

    }

    public void update(double value){
        totValue+=value*board.color;
        nvisits++;
    }



    public void expand(){
        assert(children==null);
        children = new ArrayList<>(121-board.moveCount);
        for(int i=0;i<121;i++){
            if(board.board[i]!=HexBoard.EMPTY)
                continue;

                TreeNode newNode = new TreeNode(board);
                newNode.move =i;
                children.add(newNode);

        }
    }

    public void calculateIteration(){
        ArrayList<TreeNode>visited = new ArrayList<>();
        TreeNode current =this;
        visited.add(current);

        while(!current.isLeafNode()){
            current =current.select();
            board.makeMove(current.move);
            visited.add(current);
        }

        //Found a leaf node
        double value;
        if(current.board.getWinner()==0){
            current.expand();
            TreeNode newNode =current.select();
            value =playOut(newNode.board);
        }else{
            value =current.board.getWinner();
        }

        //update all the nodes

        for(int i=1;i<visited.size();i++){
            visited.get(i).update(value);
            board.undoMove(visited.get(i).move);
        }
        visited.get(0).update(value);
    }

    public static int playOut(HexBoard board){
        int winner=0;

        if(board.moveCount==121) {
            winner=board.getWinner();

            return winner;
        }

        //Checking-Movecount vs actual stones on the board


        final double left =121-board.moveCount;
        double probibility =1/left;
        double summe =0;
        double p =random.nextDouble();

        int randomMove =0;
        for(int i=0;i<121;i++){
            if(board.board[i]!=HexBoard.EMPTY)
                continue;

            summe+=probibility;

            if(p<=summe && probibility!=0) {
                randomMove = i;
                break;
            }
        }

        board.makeMove(randomMove);
        winner =playOut(board);
        board.undoMove(randomMove);

        return winner;
    }


    public TreeNode select(){

        TreeNode bestNode=null;
        double bestValue =-10000000;
        for(TreeNode node : children){

            double uctvalue =(node.nvisits==0)?100000:(node.totValue/(node.nvisits)+Math.sqrt((Math.log(this.nvisits))/(2*node.nvisits)));
            uctvalue+=epsion*random.nextDouble();

            if(uctvalue>bestValue){
                bestValue=uctvalue;
                bestNode =node;
            }
        }

        return bestNode;
        ///
    }

    public boolean isLeafNode(){
        return (children==null);
    }
}

Is my implementation inside the method calcualteIteration() correct ?

I know this might not be a very attractive problem to look at but I would appreciate any help

CheckersGuy
  • 117
  • 10
  • This is too broad. Please do some debugging to narrow this down to a much simpler problem and a [minimal test case](https://stackoverflow.com/help/mcve). – Oliver Charlesworth Mar 20 '17 at 20:24
  • are you actually keeping track of which player makes which moves? Do you alternate turns within your iterations? To me it kinda looks like you're just letting the current player fill up the entire board within your simulations, it's pretending that there is no opponent. Or did I miss something? Also, it would be useful to tell us how many simulations you're running, and how you ultimately decide which move to play in the ''real'' game – Dennis Soemers Mar 21 '17 at 11:24
  • Sorry I should have clarified this. The board.makemove() function alternates between the two players. I tried everything from 100-50000 simulations and the outcome was pretty much the same (bad-random moves). The "best" sibling of the root node is the one with the highest uct-value and will be played by the AI – CheckersGuy Mar 21 '17 at 11:57
  • @CheckersGuy First off, your select() implementation should take into account which player is to move. If your opponent is allowed to make a move, you should negate the totValue/visits computation for the uctvalue (note: DONT negate the entire uctvalue, only the part where you compute score. The part under the sqrt should not be negated). In your current implementation of select(), the AI assumes his opponent will help him. – Dennis Soemers Mar 21 '17 at 12:34
  • At the end, I would not use the highest uct-value to decide which move to play, but only the totValue/visits part (the average score). The part of uct-value under the square root only motivates exploration of parts of the search tree that you haven't explored a lot yet, but that's not important anymore when making a move in the ''real'' game. As for simulation count, 100 definitely sounds like it would be too little, but getting closer towards 50K should be fine (for a clearly better than random player) – Dennis Soemers Mar 21 '17 at 12:36

1 Answers1

4

OP added extra information in comments after the question. The important part of that extra information is that the makeMove() method was implemented to check which player is to play next (to make sure updates to board are correct).

Given that information, the implementation of select() in the OP is not correct, because it does not take into account which player is to move when computing the UCT score. The UCT score consists of an "exploitation" part (the first fraction, computing average score over all previous simulations), and an "exploration" part (the part under square root, which increases for nodes that have been visited rarely relative to their parent). The exploitation part of this equation should be negated when the opponent is allowed to make a move next. If this is not done, the AI will essentially assume that the opponent is willing to actively help the AI, instead of assuming that the opponent will try to win for himself.

Dennis Soemers
  • 8,090
  • 2
  • 32
  • 55
  • 2
    Thanks. Now it runs very well Just tested it with 5000 simulations and I can not win :P – CheckersGuy Mar 21 '17 at 16:38
  • The best value is the highest win fraction, not the uct value (which is used to guide further exploration of the tree), especially not after introducing a random component. Other implementers have achieved perfect play after about 1000-1500 playouts. – david.pfx Jul 04 '17 at 23:18