7

I have this formula:

index = (a * k) % M

which maps a number 'k', from an input set K of distinct numbers, into it's position in a hashtable. I was wondering how to write a non-brute force program that finds such 'M' and 'a' so that 'M' is minimal, and there are no collisions for the given set K.

Gene
  • 46,253
  • 4
  • 58
  • 96
Pavel
  • 1
  • 3
  • 17
  • 51
  • I'm assuming you have the set of all `k` available when finding these numbers? Otherwise I don't see how it would be possible to find the numbers under your constraints (namely no collisions). – Mark Peters Aug 09 '15 at 16:43
  • @JohnPirie I need answer for this specific question and not how to construct a good hash function. – Pavel Aug 09 '15 at 16:44
  • 3
    Look up [Perfect hash function](https://en.wikipedia.org/wiki/Perfect_hash_function). – Sergey Kalinichenko Aug 09 '15 at 16:44
  • Is `a` a constant given as an input to the algorithm? – Sergey Kalinichenko Aug 09 '15 at 16:45
  • @MarkPeters the input is {k1,k2,k3,k4,k5,...} so k refers to every element from that set. – Pavel Aug 09 '15 at 16:45
  • @dasblinkenlight no, 'a' needs to be found too – Pavel Aug 09 '15 at 16:46
  • Then I must not understand your question. Is `k` a positive integer? If so, the answer is trivial, with `a = 1` and `M = k`. Beyond integers, it depends on the domain. –  Aug 09 '15 at 16:46
  • 1
    @JohnPirie then why is M minimal in that case? – Pavel Aug 09 '15 at 16:47
  • Your problem is not completely stated. If you have a fixed set of keys k, then it's a complete problem. As @dasblinkenlight says, perfect hash functions are very well studied. – Gene Aug 09 '15 at 16:57
  • @Gene k is a key, from a set of keys, which happens to get an integer. – Pavel Aug 09 '15 at 17:02
  • I edited your question to make this clearer. Please check out http://www.drdobbs.com/architecture-and-design/generating-perfect-hash-functions/184404506. A simple and good heuristic. I'm pretty sure true minimization is NP hard. There are also some references there. – Gene Aug 09 '15 at 18:30
  • Why is minimising M desirable? Surely minimising the maximum index is a better goal as it indicates better compaction of the hash values.... – Tony Delroy Aug 14 '15 at 04:02
  • @TonyD if M is minimal less space is being used, I guess. – Pavel Aug 14 '15 at 11:17

2 Answers2

1

If, instead of a numeric multiplication you could perform a logic computation (and / or /not), I think that the optimal solution (minimum value of M) would be as small as card(K) if you could get a function that related each value of K (once ordered) with its position in the set.

Theoretically, it must be possible to write a truth table for such a relation (bit a bit), and then simplify the minterms through a Karnaugh Table with a proper program. Depending on the desired number of bits, the computational complexity would be affordable... or not.

Little Santi
  • 8,563
  • 2
  • 18
  • 46
0

If a is co-prime to M then a * k = a * k' mod M if, and only if, k = k' mod M, so you might as well use a = 1, which is always co-prime to M. This also covers all the cases in which M is prime, because all the numbers except 0 are then co-prime to M.

If a and M are not co-prime, then they share a common factor, say b, so a = x * b and M = y * b. In this case anything multiplied by a will also be divisible by b mod M, and you might as well by working mod y, not mod M, so there is nothing to be gained by using an a not co-prime to M.

So for the problem you state, you could save some time by leaving a=1 and trying all possible values of M.

If you are e.g. using 32-bit integers and really calculating not (a * k) mod M but ((a * k) mod 2^32) mod M you might be able to find cases where values of a other than 1 do better than a=1 because of what happens in (a * k) mod 2^32.

mcdowella
  • 19,301
  • 2
  • 19
  • 25