For a puzzle-solver I'm writing, I'm looking for the fastest algorithm (minimal number of bit operations) to transpose a 5x5 bitboard with 2 bits per square in the puzzle, so:
00 01 02 03 04
05 06 07 08 09
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
becomes
00 05 10 15 20
01 06 11 16 21
02 07 12 17 22
03 08 13 18 23
04 09 14 19 24
The best I've been able to come up with is
uint64_t transpose(uint64_t state) {
return ((state >> 16) & 0x10) |
((state >> 12) & 0x208) |
((state >> 8) & 0x4104) |
((state >> 4) & 0x82082) |
((state << 4) & 0x820820) |
((state << 8) & 0x410400) |
((state << 12) & 0x208000) |
((state << 16) & 0x100000);
}
But it feels like this can be done with significantly less operations. Does anyone know a faster solution? References to good literature on the subject are also very welcome.