0

I am using GUID (128 bit) as a key into a hashing function. Then for by table size I chose 16 so the function is like

hash(guid) mod 16 which gives me the partition I need to use.

The feedback I am getting is to do something like this

Guid (without dashes so essentially a number) mod 100,000 or higher or simply prime number like 17

Reasons mentioned is:

Using mod 16 with just GUID number will use only its last 4 bits so it exposes to bais. Using 17 will not. There is any concrete math behind this ? Because 17 is 10001 and 16 is 10000 ??

If my hash function is murmur64 then can I argue that the hash(guid) % 16 will be good enough for a better distribution ?

Frank Q.
  • 6,001
  • 11
  • 47
  • 62
  • The idea behind the hash is to give you a uniform distribution of bits over the aggregate of your key space. The GUIDs might favor some bits over others, in the aggregate. Hashing should even it out so that each bit is used approximately the same number of times. If you do a `mod 16` on the hashes, you should get approximately the same number of items in each of your 16 buckets. If you do a `mod 16` on the GUIDs, you could very well end up with skewed buckets because those last 4 bits of the GUIDs aren't uniformly distributed. – Jim Mischel Nov 02 '16 at 21:12
  • Yes, but what the reasoning behind `Guid mod 17` will result it better distribution as mentioned. – Frank Q. Nov 02 '16 at 22:03
  • If you use 16, only the bottom 4 bits will affect the result. If you use 17, *all* the bits will affect the result. If your hash function is cryptographic it is guaranteed to have a random distribution and taking only 4 bits will be acceptable. – Mark Ransom Nov 02 '16 at 22:28
  • P.S. Murmur is [not cryptographic](https://en.wikipedia.org/wiki/MurmurHash). – Mark Ransom Nov 02 '16 at 22:36
  • Can you explain this part: with 16 bottom 4 bits will affect the result but with 17 all ? – Frank Q. Nov 02 '16 at 22:49
  • 1
    Perhaps this explains it: http://srinvis.blogspot.com/2006/07/hash-table-lengths-and-prime-numbers.html. The GUID generator is, in fact, a "stupid hashCode function." Using a prime number as the divisor involves all of the bits. However, that's still not as good as using a good hash function. If it were, then nobody would use a hash function for integer values: they'd just mod by a prime number. – Jim Mischel Nov 02 '16 at 23:55

0 Answers0