3

for the transposition table (generally a hash table) of a Connect Four game, I would like to use the memory efficiently (to store the most possible number of elements). One table element has to store following information:

  • lock: unsigned 64 bit
  • move: [0..6] --> unsigned 3 bit
  • score: [-2000..2000] --> signed 12 bit
  • flag: VALID, UBOUND, LBOUND: --> unsigned 2 bit
  • height: [-1..42]: --> signed 7 bit

First I tried following data structure, which needs 24 Bytes:

struct TableEntry1
{
    unsigned __int64 lock;
    unsigned char move;
    short score;
    enum { VALID, UBOUND, LBOUND } flag;
    char height;
};

After rearranging the elements it needs 16 Bytes (I found the answer for this behavior):

struct TableEntry2
{
    unsigned __int64 lock;
    enum { VALID, UBOUND, LBOUND } flag;
    short score;
    char height;
    unsigned char move;
};

My last try was:

struct TableEntry3
{
    unsigned __int64 lock;
    unsigned int move:3;
    int score:12;
    enum { VALID, UBOUND, LBOUND } flag:2;
    int height:7;
};

Which also needs 16 Bytes. Is it possible to change the structure so that it only uses 12 Bytes (on a 32 bit-architecture)? Why the compiler doesn't make my last try 12 bytes long?

Thanks!

Edit The property lock is a unique element id to detect hash collisions.

Community
  • 1
  • 1
Christian Ammer
  • 7,464
  • 6
  • 51
  • 108
  • 2
    For a transposition table, wouldn't it make more sense to value speed over memory efficiency. Instead of saving the last 4 bytes, why not align the structs properly? It would be my estimate that this would give you an overall better performance. – Tommy Andersen Jan 10 '11 at 20:10
  • @TommyA: The more Connect-Four-Game-Representations available within the transposition table, the less Game-Representations have to be evaluated. Evaluation is more expensive. – Christian Ammer Jan 10 '11 at 21:03
  • 3
    For best packing and best performance without optimization. Just list members in the order of size. From largest to smallest. The compiler will then add the least peaking. (Rule of thumb). – Martin York Jan 10 '11 at 21:09
  • 2
    If you force optimize too much it may be fine for this architecture but when you upgrade to your 64 bit Windows machine next year you may find that all your work is wasted as the minimum addressable block is 64 bits thus anything smaller than 8 bytes is padded. – Martin York Jan 10 '11 at 21:13

3 Answers3

6

Yes, since you only have 88 bits of information it is possible to pack that into 96 bits (12 bytes); however, do you really need that? In the extreme case, remember that such packing can degrade runtime performance.

If you were storing gazillions of these to disk, considering tiny efficiencies earlier would make more sense, but is that the case here? Have you seen a problem in memory use? How much memory do you need, currently with 16 byte objects, and how close is that to your planned limit? Trying to optimize runtime memory use without answering these last two questions is premature.

That aside, I suspect your compiler is padding at the end of the struct so the __int64 is always aligned on an 8-byte boundary. Consider an array of these with length two: with a 12 byte size, at most one of the __int64 sub-objects could be 8-byte aligned.

Community
  • 1
  • 1
Fred Nurk
  • 13,952
  • 4
  • 37
  • 63
  • Of course, disks are generally bigger than RAM too, so it really has to be gazillions of gazillions – the differences between them are too much to go into here. A bigger issue is perhaps that disk/storage structures are more permanent, while RAM/working structures can be more fluid. – Fred Nurk Jan 10 '11 at 20:15
  • Disk is not used. I see two advantages of minimizing the size: From reducing the size from 24 to 12 Bytes there can be stored twice as much elements within the transposition table (i also use the symmetry property of the Connect-Four-Game to reduce table elements). If a Connect-Four-Board-Representation isn't available via the transposition table, it has to be evaluated expensively. Secondly, I suppose that the program becomes faster because more elements can be hold in a fast accessible cpu cache. – Christian Ammer Jan 10 '11 at 20:46
  • @Christian: Reducing memory use **always** *sounds* like a good idea, but have you seen a problem in memory use? how close is your current use (with 16 byte size) to your planned limit? If the answers are "no" and "well under", then, assuming the planned limit doesn't need to be revised, what problem are you solving? – Fred Nurk Jan 10 '11 at 21:03
  • @Christian: Fitting data in cache doesn't always increase speed: packing and unpacking those bitfields can be *slow,* and it will lead to increased code size, which may potentially cause more cache misses than otherwise. – Fred Nurk Jan 10 '11 at 21:07
3

You can reach 12 bytes by using non-standard constructs such as Visual Studio #pragma pack :

#pragma pack(1)
struct TableEntry3
{
    unsigned __int64 lock;
    unsigned int move:3;
    int score:12;
    enum { VALID, UBOUND, LBOUND } flag:2;
    int height:7;
};

Here, sizeof(TableEntry3) yields 12 for me.

icecrime
  • 74,451
  • 13
  • 99
  • 111
  • This pragma is also available for GCC, in case you are using it. – data Jan 10 '11 at 20:07
  • 1
    At the expense of making `lock` unaligned of course. To avoid that, `lock` could be stored in a separate table from the rest of the data. – Ben Voigt Jan 10 '11 at 20:09
  • 3
    Generally speaking, locks have to be aligned to work properly - atomic compare-and-swap tends to break when you try to use it spanning cache lines. :) – bdonlan Jan 10 '11 at 20:11
  • Borland (Embarcardero C++ Builder) also supports `#pragma pack` as do Intel (obviously) :) – Tommy Andersen Jan 10 '11 at 20:12
  • In GCC, the recommended way is to use `__attribute__((aligned(1)))` attribute to specify full packing, see http://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/Type-Attributes.html . – Adam Rosenfield Jan 10 '11 at 20:13
  • @Adam removed the gcc part from my answer, I'm not familiar enough with it – icecrime Jan 10 '11 at 20:15
2

Depending on how you implement locks, it may be possible to reduce the size of your lock field (I'm assuming this is a typical SMP lock SMP machine)

One option would be to reduce the number of locks. That is, have a seperate, smaller array of locks, and lock an element derived from your array element here. That is, if you're accessing transposition table element t, use lock t % locktablesize. The lock table doesn't have to be nearly as big as the transposition table.

Using this approach, and your TableEntry2 field order, and assuming your lock table is half the size of the transposition table (which is probably bigger than necessary) you get down to 12 bytes without losing performance due to bitshift operations - this performance loss can be quite significant, so it's always helpful to be able to avoid it.

bdonlan
  • 224,562
  • 31
  • 268
  • 324