I think what you are describing is already the most efficient way. Looping over the bitboard until it is zero and pick one move at a time.
To sketch the idea with some code, it could look like this:
using Bitboard = uint64_t; // 64 bit unsigned integer
pMoves createAllMoves(Bitboard mask, int from_sq, Move* pMoves) {
while(moves != 0) {
int to_sq = findAndClearSetBit(mask);
*pMoves++ = createMove(from_sq, to_sq);
}
return pMoves;
}
The findAndClearSetBit
function can choose any set bit, but commonly on today's hardware, finding the least significant bit is most efficient. If you are using GCC or Clang, you can use __builtin_ctzll
which should be optimized to the specific hardware:
int findAndClearSetBit(Bitboard& mask) {
int sq = __builtin_ctzll(mask); // find least significant bit
mask &= mask - 1; // clear least significant bit
return sq;
}
If I am not mistaken, your existing function bitScanForward
is already an implementation to find the least significant bit. So, you can use it to get a portable version.