I'm currently working to improve a chess engine in python. more specifically I'm working on the move generation of sliding pieces, like rooks and bishops. originally I used this loop function for both rook and bishop moves:
def get_rook_moves(self, color, position_bitboard):
# Get the bitboard for the bishop's position
self.update_occupied_sqaures()
original_position_bitboard = position_bitboard
moves = 0
rank_8 = 0xff00000000000000
rank_1 = 0x00000000000000ff
file_a = 0x101010101010101
file_h = 0x8080808080808080
enemy_occupied_sqaures = self.occupied_black_sqaures if color == 'w' else self.occupied_white_sqaures
friendly_occupied_sqaures = self.occupied_white_sqaures if color == 'w' else self.occupied_black_sqaures
#diagonal 1 or north east
while not position_bitboard & rank_8:
position_bitboard <<= 8
if position_bitboard & enemy_occupied_sqaures:
moves |= position_bitboard
break
elif position_bitboard & friendly_occupied_sqaures:
break
moves |= position_bitboard
position_bitboard = original_position_bitboard
#diagonal 2 or south east
while not position_bitboard & rank_1:
position_bitboard >>= 8
if position_bitboard & enemy_occupied_sqaures:
moves |= position_bitboard
break
elif position_bitboard & friendly_occupied_sqaures:
break
moves |= position_bitboard
position_bitboard = original_position_bitboard
#diagonal 2 or south east
while not position_bitboard & file_h:
position_bitboard <<= 1
if position_bitboard & enemy_occupied_sqaures:
moves |= position_bitboard
break
elif position_bitboard & friendly_occupied_sqaures:
break
moves |= position_bitboard
position_bitboard = original_position_bitboard
#diagonal 2 or south east
while not position_bitboard & file_a:
position_bitboard >>= 1
if position_bitboard & enemy_occupied_sqaures:
moves |= position_bitboard
break
elif position_bitboard & friendly_occupied_sqaures:
break
moves |= position_bitboard
# Return the final bitboard containing the bishop's legal moves
return moves
but after looking at more efficient methods I settled on two, subtraction fill and kogge stone fill. Both functions used a singular fill algorithm combine with rotations to generate a fill for each direction.
subtration fill:
def east_fill (position_bitboard, occupied_sqaures):
#consolodate piece position, occupied sqaures, and h file, 2 operations
occInclRook = position_bitboard | occupied_sqaures | h
#xor h to piece position and xor consolodation of privious line, 4 operations
occExclRook = (position_bitboard & ~h) ^ occInclRook
#subtract piece position from privous line, and xor cosolodation, 2 operations
rookAttacks = (occExclRook - position_bitboard) ^ occInclRook
#xor out friendly pieces from rookattacks, 2 operations
return rookAttacks
kogge stone fill:
def north_fill (position_bitboard, occupied_sqaures, enemy_occupied_sqaures):
#Fill north direction of piece, 6 operations
fillnorth = position_bitboard
fillnorth |= fillnorth << 8
fillnorth |= fillnorth << 16
fillnorth |= fillnorth << 32
#genreate blockers based on first fill and determine cosest blocker, 2 operations
blockers = fillnorth & occupied_sqaures
closest_blocker = blockers & -blockers
#back fill north direction starting from closest blocker, 6 operations
backfillnorth = closest_blocker
backfillnorth |= backfillnorth << 8
backfillnorth |= backfillnorth << 16
backfillnorth |= backfillnorth << 32
#and out complimetary of backfill, 5 operations
fillnorth &= ~(backfillnorth^(closest_blocker & enemy_occupied_sqaures))
return fillnorth
when using these with a combination of three rotations for each direction in the subtraction fill, and four rotations for each direction in the kogge stone fill. while the induvial algorithms were faster than the loop fill in one direction, when doing all directions, the kogge and subtraction fill were both 13 nanoseconds slower using correct rotations, and 3 nanoseconds slower using the fastest rotations
how a rotated direction would look:
fos = r.flipVertical(occupied_sqaures)
fes = r.flipVertical(enemy_occupied_sqaures)
fpb = r.flipVertical(position_bitboard)
fillsouth = r.flipVertical(north_fill (fos, fes, fpb))
i got these results using the fastest rotation and reducing the number of rotations from 4 to 3 on subtraction fill:
subtraction_fill average time (microseconds): 1.4904390062854784 kogge_fill average time (microseconds): 2.3933798188355007 loop average time (microseconds): 2.7865687436801885
subtraction fill all directions average time (microseconds): 3.5131265412457604 kogge fill all directions average time (microseconds): 6.532563156101725 loop all directions average time (microseconds): 2.8232414561724983
90 degree clockwise rotation average time (microseconds, calling other rotations): 1.3400337985097461 90 degree counter clockwise rotation (microseconds, calling other rotations): 1.3546858758556926 mirror rotation average time (microseconds): 0.711181154469959 vertical flip rotation average time (microseconds): 0.6649066519999233 diagnal flip A8H1 rotation average time (microseconds): 0.9260225979791413 diagnal flip A1H8 time (microseconds): 0.8079409734359478
and when removing all rotatiions from both kogge and subtraction fill i get this:
subtraction fill all directions average time (microseconds): 1.6138958483908323 kogge fill all directions average time (microseconds): 2.5905292420269417 loop all directions average time (microseconds): 2.569006541722452
I'm just usure if I implemented the rotations in an efficient way, or whether it would be better to d ono rotations at all. also sorry if the question is confusing, its my first time doing one of these and the topic itself is kind of confusing.