1

For a puzzle-solver I'm writing, I'm looking for the fastest algorithm (minimal number of bit operations) to transpose a 5x5 bitboard with 2 bits per square in the puzzle, so:

00 01 02 03 04
05 06 07 08 09
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24

becomes

00 05 10 15 20
01 06 11 16 21
02 07 12 17 22
03 08 13 18 23
04 09 14 19 24

The best I've been able to come up with is

uint64_t transpose(uint64_t state) {
    return ((state >> 16) &     0x10) |
           ((state >> 12) &    0x208) |
           ((state >>  8) &   0x4104) |
           ((state >>  4) &  0x82082) |
           ((state <<  4) & 0x820820) |
           ((state <<  8) & 0x410400) |
           ((state << 12) & 0x208000) |
           ((state << 16) & 0x100000);
}

But it feels like this can be done with significantly less operations. Does anyone know a faster solution? References to good literature on the subject are also very welcome.

Arthelais
  • 606
  • 1
  • 7
  • 17
  • 1
    It seems to me like you'd need a no-shift term in there. `transpose(1)` seems to return the wrong result. – user2357112 Mar 21 '15 at 22:51
  • 2
    Also, it looks like some of the constants in that `transpose` are based on a 1-bit-per-square bitboard. – user2357112 Mar 21 '15 at 22:55
  • Depending on what you do with it, you could use an additional flag that indicates, whether this should be read as a row major or colum major matrix. Thats something you often find in libraries/programs that work with large matricies: Instead of performing an actal, costly transform operation, they e.g. use a different multiplication algorithm. – MikeMB Mar 22 '15 at 02:44
  • There's probably a good delta-swap sequence for it – harold Mar 22 '15 at 11:42

1 Answers1

1

A 2048 AI found here has a transpose method for a 4x4 grid of 4 bit tiles. Perhaps you can adapt it to your situation. Here is the code:

// Transpose rows/columns in a board:
//   0123       048c
//   4567  -->  159d
//   89ab       26ae
//   cdef       37bf
uint64_t transpose(uint64_t x)
{
    uint64_t a1 = x & 0xF0F00F0FF0F00F0FULL;
    uint64_t a2 = x & 0x0000F0F00000F0F0ULL;
    uint64_t a3 = x & 0x0F0F00000F0F0000ULL;
    uint64_t a = a1 | (a2 << 12) | (a3 >> 12);
    uint64_t b1 = a & 0xFF00FF0000FF00FFULL;
    uint64_t b2 = a & 0x00FF00FF00000000ULL;
    uint64_t b3 = a & 0x00000000FF00FF00ULL;
    return b1 | (b2 >> 24) | (b3 << 24);
}