0

My question relates a lot to this topic:

Hash function on list independant of order of items in it

Basically, I have a set of N numbers. N is fixed and is typically quite large, eg. 1000 for instance. These numbers can be integers or floating-point. They can be equal, some or all of them. No number can be zero.

Every combination of K numbers where K is anything between 1 and N leads to the calculation of a hash.

Let's take an example with 3 numbers, that I will call A, B and C. I need to calculate a hash for the following combinations:

A
B
C
A+B
B+C
A+B+C
A+C

Things are order-independent, C+A is just equal to A+C. '+' can be a real addition or something different, like a XOR, but it is fixed. Likewise, every value may go through a function first, eg.

f(A)
f(B)
f(A)+f(B)+f(C)
...

Now, I need to avoid collisions, but in a specific way only. Each combination is tagged with a number, either 0 or 1. Collisions may occur such that, if possible, only those tagged with the same number (0 or 1) may collide. In this case many collisions are even welcome indeed, especially if this makes the hash value compact. I mean, ideally, the best hash is only 1 bit long ! (0 or 1). Collisions between combinations tagged with different numbers (0 and 1) should only rarely happen if possible - this is the whole point.

Let's take an example. Combination -> tag -> calculated hash value:

Combination  Tag  Hash
A          -> 0 -> 0
B          -> 1 -> 1
C          -> 0 -> 2
A+B        -> 0 -> 0
B+C        -> 1 -> 1
A+B+C      -> 1 -> 3
A+C        -> 0 -> 2

Here, the hash values are valid because there is no collision between combinations of different tags. A collides with A+B for instance, but they're both tagged '0'.

However, the hash is not very good overall, because I need 4 bits, which seems a lot for only 4 input numbers.

How can find a good (good enough) hash function for this purpose?

Thank you for your insight.

Community
  • 1
  • 1
Martin Frank
  • 199
  • 1
  • 13
  • 1
    How about having separate hash tables for each tag value? – 500 - Internal Server Error Feb 04 '15 at 14:27
  • 1. Why your table is valid, while A+C and C have same tag and hash, but are different themselves? I see it as a collision. 2. Are the collisions allowed? What does mean: should rarely happen? Undefined question should be closed! 3. Can numbers be combined with functions? What are the functions, how many are they? ... You should greatly edit your question! – Gangnus Feb 04 '15 at 14:30
  • A separate hash table for each tag value might make a lot of sense indeed. I thought about that for some time, but it doesn't simplify my problem. Does it? What do you have in mind? – Martin Frank Feb 04 '15 at 16:09
  • The functions are to be found (or an educated guess should be made), so are the individual numbers. That basically is the question. I used a generic solver for the numbers, but it will take years to find a solution. Please refer to the link I have given for more details, in particular the first answer from 'Per' – Martin Frank Feb 04 '15 at 16:12
  • If combination and bounded tag are not correlated (tag can't be predicted by combination structure), then I doubt that you can get any profit from tag exploitation. – Alexey Birukov Feb 04 '15 at 16:20
  • Thanks for your answer. Tags are part of the input. Clearly, there is no known correlation between any number set and the tag. But the existence of the tag leads to the desire of finding a hash function (and values for A,B... and so on) where collisions preferably happen where tags are the same. – Martin Frank Feb 04 '15 at 16:47
  • I have this model in mind: You have positive words like "good", "warm", "fair" and negative like "poor", "sad", "ill" etc. My point, there is no hash functions on strings to discriminate them without using dictionary. – Alexey Birukov Feb 05 '15 at 12:08

0 Answers0