1

I am writing a reinforcement learning checkers engine, and I've reached a performance roadblock. The network is capable of learning quickly on my machine, but my game/mcts implementation is so slow that it takes hours for the network to play a few hundred games against itself. This makes the learning process infeasably slow, so I've decided to rewrite the game using bitboards. This is my first implementation before rewriting:

import numpy as np

def jump_generator(i, j, board, player, is_king, jumped):

    d = -(-1)**(i%2)
    no_move = True

    for l in [player, -player][0:is_king+1]:
        for k in [j+d,j]:
            if -1 < i+2*l and i+2*l < 8:
                if -1 < j+d*(-1)**(j == k) and j+d*(-1)**(j == k) < 4:
                    if board[i+l][k]*player < 0:
                        if board[i+2*l][j+d*(2*(j != k)-1)] == 0:
                            if not (i+l,k) in jumped:
                                no_move = False
                                for continuation in jump_generator(i+2*l,
                                                                    j+d*(-1)**(j == k),
                                                                    board,
                                                                    player,
                                                                    is_king,
                                                                    jumped+[(i+l, k)]):
                                    yield [(i+l, k)] + continuation
    if no_move:
        yield [(i, j)]

def move_generator(i, j, board, player, is_king):

        d = -(-1)**(i%2)
        no_move = True

        for l in [player, -player][0:is_king+1]:
            for k in [j+d,j]:
                if -1 < i+l and i+l < 8:
                    if -1 < k and k < 4:
                        if board[i+l][k]*player == 0:
                            yield [(i, j), (i+l, k)]
                            no_move = False
        if no_move:
            yield [(i, j)]

class GameState():

    def __init__(self, board, counter=None, legal=None, score=None):

        self.board = board
        self.turn = self.board[-1]

        if legal == None:
            moves = []
            jumps = []
            for i, row in enumerate(self.board[:-1]):
                for j, square in enumerate(row):
                    if square*self.turn > 0:
                        for jump_seq in jump_generator(i, j, self.board[:-1], self.turn, square**2 > 1, []):
                            if len(jump_seq) > 1:
                                jumps.append([(i, j)]+jump_seq)
                        if not jumps:
                            for move in move_generator(i, j, self.board[:-1], self.turn, square**2 > 1):
                                if len(move) > 1:
                                    moves.append(move)
            if jumps:
                legal = jumps
            else:
                legal = moves
            legal = [tuple(a) for a in legal]
        self.legal = legal

        if counter == None:
            counter = 0
        self.counter = counter

        if score == None:
            if self.counter > 50:
                score = 0
            if not self.legal:
                score = -self.turn
        self.score = score

    def next_board(self, move):
        board = [list(row) for row in self.board[:-1]]
        is_king = board[move[0][0]][move[0][1]]**2 > 1
        if not is_king and 2*move[-1][0] == 7+7*self.turn:
            board[move[-1][0]][move[-1][1]] = self.turn*2
        else:
            board[move[-1][0]][move[-1][1]] = board[move[0][0]][move[0][1]]
        board[move[0][0]][move[0][1]] = 0
        move = move[1:-1]
        for square in move:
            board[square[0]][square[1]] = 0
        return tuple([tuple(row) for row in board])+(-self.turn,)

    def next_state(self, move):
        man_moved = self.board[move[0][0]][move[0][1]]**2 == 1
        reset_counter = man_moved or len(move) > 2
        return GameState(self.next_board(move), (not reset_counter)*(self.counter+1)) 

Essentially, the game is represented as a 4x8 numpy array with entries ranging from -2, -1, 0, 1, 2. These values represent the different piece types and an empty square. The helper functions jump_generator and move_generator take a given square (i, j) and return a list of steps/captures that are possible from there. The jump_function is recursive to generate forced sequences of captures. A game state object is initialized with the board array, whence it generates it legal actions using the helper functions. Thats basically it. It also has a method to generate a new state from one of its own legal actions.

Below is my newer implementation using bitboards. The 2d array in the previous is now four 32-bit integers. Each integer represents a piece type and binary of the integer represents that piece type's presence on the board. The board is encoded in a clever way so that accessing the corner-adjacent squares is done by just rotating the bits a fixed number per 'move vector'. There are functions for these bit operations, and constant integer masks which encode squares that result in out-of-bounds for each move 'vector'. There is function that takes four piece planes (the ints) and returns 8 move planes. This is all done using bit logic on the np.uint32's. I even used some memoization. Lastly these planes are 'iterated' by bitshifts to find the legal moves and generate the resultant position.

import numpy as np
int32 = np.uint32
int8 = np.uint8
zero = int32(0)
one = int32(1)

def rot(b, n):
    n %= 32
    n = int32(n)
    return (b >> n) | (b << (int32(32)-n))

def reverse(b):
    r = zero
    for i in range(32):
        r = r << one
        if b & one:
            r = r ^ one
        b = b >> one
    return r

def swap(b):
    return reverse(rot(b, int32(20))) 

masks_move = (~int32(2**1 + 2**5 + 2**9 + 2**11 + 2**17 + 2**25 + 2**31),
        ~int32(2**2 + 2**5 + 2**10 + 2**11 + 2**18 + 2**25 + 2**26 + 2**31),
        ~int32(2**0 + 2**1 + 2**6 + 2**9 + 2**12 + 2**17 + 2**18 + 2**25),
        ~int32(2**0 + 2**2 + 2**6 + 2**10 + 2**12 + 2**18 + 2**26))
masks_jump = (~int32(2**0 + 2**4 + 2**8 + 2**10 + 2**16 + 2**24 + 2**30),
            ~int32(2**3 + 2**4 + 2**19 + 2**24 + 2**27 + 2**30),
            ~int32(2**7 + 2**8 + 2**13 + 2**16 + 2**19 + 2**24),
            ~int32(2**1 + 2**3 + 2**7 + 2**11 + 2**13 + 2**19 + 2**27))
masks = masks_move + masks_jump
promote = ~int32(2**5 + 2**11 + 2**25 + 2**31)


def move_boards(m1, k1, m2, k2):
    ally = m1 | k1
    enemy = m2 | k2
    every = ally | enemy

    enemy_0 = rot(enemy, one)
    enemy_4 = rot(enemy_0, one)
    enemy_1 = rot(enemy_4, int32(5))
    enemy_5 = rot(enemy_1, int32(7))
    enemy_7 = rot(enemy_5, int32(4))
    enemy_2 = rot(enemy_7, int32(7))
    enemy_6 = rot(enemy_2, int32(5))
    enemy_3 = rot(enemy_6, one)

    ally_0 = rot(ally, one)
    ally_4 = rot(ally_0, one)
    ally_1 = rot(ally_4, int32(5))
    ally_5 = rot(ally_1, int32(7))
    ally_7 = rot(ally_5, int32(4))
    ally_2 = rot(ally_7, int32(7))
    ally_6 = rot(ally_2, int32(5))
    ally_3 = rot(ally_6, one)

    board0 = ally & ~(ally_0 | enemy_0) & masks[0]
    board1 = ally & ~(ally_1 | enemy_1) & masks[1]
    board2 = ally & ~(ally_2 | enemy_2) & masks[2]
    board3 = ally & ~(ally_3 | enemy_3) & masks[3]

    board4 = ally & enemy_0 & ~ally_4 & masks[0] & masks[4]
    board5 = ally & enemy_1 & ~ally_5 & masks[1] & masks[5]
    board6 = ally & enemy_2 & ~ally_6 & masks[2] & masks[6]
    board7 = ally & enemy_3 & ~ally_7 & masks[3] & masks[7]

    return (board0, board1, board2, board3, board4, board5, board6, board7)

def actions(m1, k1, m2, k2):
    boards = move_boards(m1, k1, m2, k2)
    every = zero
    actions = []
    for b, board in enumerate(boards):
        every |= board
    place = one
    for i in range(32):
        if place & every == place:
            for b, board in enumerate(boards):
                if place & board == place:
                    actions.append((i,b))
        place <<= one
    return actions

I tested both by having them 'rollout' to a terminal state a thousand times in a row. The first implementation did it 3-4 times faster. This seemed true for arbitrarily many games.

basket
  • 181
  • 6
  • 2
    `states += (new_state,)` is really slow, copying the entire tuple every time. Use a list and `states.append(new_state)`. Also, all this int32 construction might be slower than just using Python ints. – Ry- Apr 13 '19 at 05:07
  • @Ry- I updated the newer code to reflect this and other memoization improvements. Its still slower, although less so. I don't see why using np.uint32 would be slower, since it apparently uses the c level type, and I need something with 32 bits for the math to be convenient and waste-free – basket Apr 13 '19 at 06:03
  • Is `swap` unused after the latest changes? Anyway, all those fixed rotations by 1, 5, 7, 4, … can have the `n %= 32` and `n = int32(n)` steps skipped. You might even just want to inline the rotation completely so there’s no subtraction and no function call. – Ry- Apr 13 '19 at 06:08
  • I removed the n %= 32, which would have resulted in a fatal error because it would recast n and then the boards to a different type. The new code is about 3 times faster than the old, when measured by generating moves from the strart position, adding the state to a list, and popping it over and oer – basket Apr 13 '19 at 17:11
  • 1
    @Ry- so yeah, I rewrote it with python default ints and other improvements and its several times faster than the old code, so you were write about the numpy ints – basket Apr 15 '19 at 03:02

0 Answers0