1

Here's the problem:

X is a positive integer (include 0) set which has n different elements I know in advance. All of them is less equal than m. And I want to have an occ-free hash function as simple as possible to map them to 0-n-1.

For example:

X = [31,223,121,100,123,71], so n = 6, m = 223.

I want to find a hash function to map them to [0, 1, 2, 3, 4, 5].

If mapping to 0-n-1 is too difficult, then how to mapping X to a small range is also a problem.

Finding such a function is not too difficult, but to be simple and easy to be generated is hard.

It's better to preserve the order of the X.

Any clues?

mxmxlwlw
  • 31
  • 6

2 Answers2

2

My favorite perfect hash is pretty easy.

The hash function you generate has the form:

hash = table1[h1(key)%N] + table2[h2(key)%N]

h1 and h2 are randomly generated hash functions. In your case, you can generate random constants and then have h1(key)=key*C1/m and h2(key)=key*C2/m or something similarly simple

To generated the perfect hash:

  1. Generate random constants C1 and C2
  2. Imagine the bipartite graph, with table1 slots and table2 slots as vertices and an edge for each key between table1[h1(key)%N] and table2[h2(key)%N]. Run a DFS to see if the graph is acyclic. If not, go back to step 1.
  3. Now that you have an acyclic graph, start at any key/edge in each connected component, and set its slots in table1 and table2 however you like to give it whatever hash you like.
  4. Traverse the tree starting at the vertices adjacent to the edge you just set. For every edge you traverse, one of its slots will already be set. Set the other one to make the hash value come out however you like.

That's it. All of steps (2), (3) and (4) can be combined into a single DFS traversal pretty easily.

The complete description and analysis is in this paper.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • I think there's some place to improve. Because there's some unused info, and hash function is not simple enough. The generation is complex too. It's a valid solution though. – mxmxlwlw Feb 01 '22 at 02:18
1

As the number of entries is small, you can use brute force. I found this (Java: long is 64 bit signed, int is 32 bit signed):

private static int hashReduce(long x, long seed, int n) {
    long x1 = ((long) x + seed);
    int x2 = (int) ((x1 >>> 32) ^ x1);
    int x3 = ((x2 >>> 16) ^ x2) * 0x45d9f3b;
    return (int) (((x3 & 0xffffffffL) * n) >>> 32);
}

public static void main(String... args) throws InterruptedException {
    long[] data = new long[] { 31, 223, 121, 100, 123, 71 };
    for (int j = 0; j < 6; j++) {
        System.out.println(data[j] + " -> " + hashReduce(data[j], -2126115507L, 6));
    } 
}

x: 31 -> 0
x: 223 -> 1
x: 121 -> 2
x: 100 -> 3
x: 123 -> 4
x: 71 -> 5
Thomas Mueller
  • 48,905
  • 14
  • 116
  • 132
  • That's interesting. But if n is large, it will be much harder. I think this is a good idea, just make it less time-consuming would be better. – mxmxlwlw Aug 08 '23 at 02:40
  • You can use brute force for a surprisingly large size. And if the size it too large, then you can split into subsets that you brute-force. This is the idea of the RecSplit algorithm I invented: https://arxiv.org/abs/1910.06416 – Thomas Mueller Aug 16 '23 at 15:44