0

I have a series of hash keys, many of which need to map to the same result. If I know exactly which hash keys need to map where, how can I go about designing a hash function that maximizes these collisions (minimizes necessary table size)? I am interested only in function speed and table size, not security.

I will be doing this for >100 tables, and am primarily asking for tips/suggestions on how to approach the problem, since each table will have its own range of keys and such.

As an example, here is the first table I am working on. The keys range from 0 to 31 (not all will occur in practice), and each should map to a particular group.

Group    Key
1        0
2        3,6,9,15,18,21,24,27
3        8,10,12,14,16,20,22,26,28
4        11,17,23,29
5        13,25
6        19
7        31

I know know I could just make a lookup table or switch case for this instance, but other tables will span approximately 2^12 different hash keys and only have 49 unique groups. So in an effort to balance speed and table size, how would I even start to try finding a simple function for this (and similar cases)?

EDIT To clarify, I am not looking for an answer to this particular example. I am simply looking for insight to different patterns to look out for, how to take advantage of said patterns, common techniques (if this is a common thing to do?), etc. I am not sure that was clear on the original wording.

Hufflet
  • 73
  • 5
  • Habe you researched "perfect hash functions"? Kinda opposite of what you want, but maybe helpful for solving your problem. https://en.wikipedia.org/wiki/Perfect_hash_function (also check out eg. gperf mentioned there). – hyde Aug 29 '19 at 03:34
  • Those numbers have some nice patterns to them. Are there always nice patterns like these, or is this one just particularly nice? – templatetypedef Aug 29 '19 at 03:34
  • I am looking at a plot of them and honestly I am not sure if the patterns are coincidence or not. I am generating the key/group pairs based off of magic bitboards (chess programming technique), so the keys are technically the indexes returned from a previous hash function (each set of keys comes from an independent previous function). In other words, I am afraid I really do not know. – Hufflet Aug 29 '19 at 03:38
  • The number 2^12 is only 4096: not a particularly large table. If every one of your tables was that big, you're talking about half a million entries at two bytes each. Figure a megabyte, tops, for all of your tables. Now, generating those tables might be a pain, but you only have to do it once. And direct lookup is probably going to give you the fastest function. – Jim Mischel Aug 29 '19 at 14:33

0 Answers0