Hashing google interview

Question

why can't powers of 2 or power's of 10 or prime numbers be good hashing functions? If we want to store overflow records in a hash function, why aren't those good for selection of hashing functions?

Number by itself is well, a number, not a hashing function. Can you provide more context? Perhaps write a formula for your hashing function? — mvp, Sep 21 '14 at 07:03
I suppose you mean using powers of 2 and 10, and prime numbers as the modulus? gcc's implementation uses prime moduli - the modulus simply keeps increasing in accordance with the required number of buckets. — Pradhan, Sep 21 '14 at 07:03
Exactly right, as modulus. Why aren't the selection of powers of 2,10 and prime numbers as modulus of hashing functions yield the best memory management results? — Shrerocx, Sep 21 '14 at 07:08

score 4 · Answer 1 · answered Sep 21 '14 at 11:45

Suppose your hash function returns a 32-bit unsigned result. Suppose you choose a modulus of 4096. What you do is, effectively: index = hash & 0xFFF -- so, you throw away the top 20 bits of your hash value. Now, if your hash is really good, and the bottom 12 bits are just as good as the rest, then that's not a problem. However, if your hash is pretty good over all 32 bits, but the bottom 12 bits are suspect (they might, for example, be more strongly influenced by the last characters of a string)... then you may regret discarding the top 20. In this case, if you choose any odd modulus, then index = hash % modulus the result depends on all 32 bits of the hash.

So, more generally, if your hash is calculated modulo M, and your index is taken as hash % N, then what you want is for your M and N to be co-prime.

If M is 2^m (as it usually is), then N=10^n is a poor choice, because the bottom n bits of the resulting index are a straight copy of the bottom n bits of the hash.

Hashing google interview

1 Answers1