3

When the chessboard is stored in a variety of bitboards, how do modern chess engines recognise what type/side piece is situated on a particular cell? I'm having problems with this, since to find out what type/side piece a particular bit is, I have to always do:

if((bit_check & occupied) == 0ULL) ... // empty
else if((bit_check & white) != 0ULL) // white
    if((bit_check & white_pawns) != 0ULL) ... // white pawns
    else if((bit_check & white_rooks) != 0ULL) ... // white rooks
    ....
    else if((bit_check & white_kings) != 0ULL) ... // white kings
else if((bit_check & black) != 0ULL) // black
    if((bit_check & black_pawns) != 0ULL) ... // black pawns
    ....
    else if((bit_check) & black_kings) != 0ULL) ... // black kings

This is quite a tedious process and it has to be done quite a few times (for example, during move generation to see what is being captured). I'm not sure if I should just go with this or whether it would be faster to simply create a 64 array of type Piece[64], which will inherently store the piece type.

Which would be better, considering it will have to be millions of times, for capture analysis in the search functions. Am I doing this wrong?

Shreyas
  • 667
  • 2
  • 7
  • 20

2 Answers2

3

The bit check itself is fast; I'd be worried mostly about the branching.

Instead, consider uint64_t bitboards[12] as an array of the 12 bitboards for all pieces. This is now contiguous in memory and can be scanned in a loop:

for (int i = 0; i != 12; ++i)
{
  if (bitboards[i] && bit_check) return i;
}
return -1; // empty.

Only two branches (loop and check) is easier for the branch predictor and the contiguous memory optimizes the prefetcher.

Obvious variations are checking bitboards[0] to [5] for white pieces only and [6] to [11] for black pieces only.

A more subtle variant:

uint64_t bitboards[13];
bitboards[12] = ~uint64_t(0);
for (int i = 0; /* NO TEST*/ ; ++i)
{
     if (bitboards[i] && bit_check) return i;
}

Instead of returning -1 for empty, this will return 12 (the sentinel value). However, this replaces the conditional loop branch with a faster unconditional branch. It also means that the return value is always int i.

Another unrelated optimization is to recognize that pawns are the most common pieces, so it's more efficient to use bitboards[0] for white pawns and either bitboards[1] or bitboards[6] for black pawns, depending on whether you interleave black or white pieces.

[edit] If you have a separate bitboard for color, you then do not need two bitboards for white pawns and black pawns. Instead, have a single bitboard for pawns. To check for a black pawn, AND the two values. (bit_check & color & bitboard[0]). To check for a white pawn, invert the color (bit_check & ~color & bitboard[0])

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • I've already used the simple optimisations your mentioned. That is, I checked if it is empty first, followed by pawns, then rooks, knights, bishops, queens and then finally for kings. Nevertheless, I don't see how I could have missed it. Great solution. – Shreyas Jul 29 '15 at 08:56
  • I think you mean i++, by the way. bitboards[0] isn't being checked at all with ++i! – Shreyas Jul 29 '15 at 10:57
  • 1
    @ShreyasVinod: You may want to pull out your C++ book again. The expression used inside the loop is just plain `i`. And the `for` loop ignores the value of the third part; it only checks the boolean value of the middle part. I could write `for(int i = 0; i != 12; ++i * 7)` and it would do exactly the same. Multiplying an unused value by 7 still leaves it unused. But if you don't believe me, add a `std::cout << i << std::endl;` inside the loop. – MSalters Jul 29 '15 at 11:07
  • Huh, interesting, you're right, my bad. I expected it to increment prior to the condition check. – Shreyas Jul 29 '15 at 12:40
  • First, you never actually need to check all 12 bitboards in one loop... In practice, this is a loop of 0, 1, 2, 3, 4. I just unroll it but even if you don't the optimizer would, so avoiding the conditional doesn't actually help. – VoidStar Jul 29 '15 at 20:40
  • @VoidStar: Keep in mind that unrolling a loop with early exit (`return i`) is possible but not so trivial. I'll happily let the compiler decide if that's worth it. – MSalters Jul 30 '15 at 07:35
1

This is the slowest operation for a bitboard. However, you rarely have to perform it.

I see you are maintaining a bitwise 'or' of all white pieces, white and a bitwise or of all black pieces, black. Using those, you can quickly reject moves onto your own pieces and easily detect capture.

In the somewhat unlikely event of a capture, you have to test for up to 5 of the 6 enemy piece bitboards, because king capture should have already been ruled out. Also, this is not as tedious as you imagine; on a 64 bit system, each mask is only 1 operation per bitboard and then a compare, so 10 integer operations. And/Or are some of the lightest operations on the processor. Maintaining the Piece[64] alone costs more time than this does.

I believe there is no other case (within the engine code) where you need to get a pieceID from a given square.

The major advantage of the bitboards is the move generation and positional analysis. There is nothing that compares, so you'll be maintaining this structure no matter what.

VoidStar
  • 5,241
  • 1
  • 31
  • 45
  • If you look at the code, `white` and `black` are already are in the mix. And I agree, bitwise operations are a breeze for modern ALUs. – Shreyas Jul 29 '15 at 07:35
  • Hm, thinking about it a little bit, you're quite right as captures are somewhat rare in the move generation process. – Shreyas Jul 29 '15 at 07:48
  • I'm not sure whether it's efficient to have a bitboard for "all white pieces" since it's trivially generated. – MSalters Jul 29 '15 at 08:23
  • It is, i've measured it different ways. it gets read as part of a number of early outs and other cases, and in the positional analysis code too. – VoidStar Jul 29 '15 at 09:46