2

I have been reading and learning hashing and hashtables and experemented with some code(I am still very new to this so I might say something wrong that I missunderstood). I came to the issue for perfect hash functions. Provided that I have my own custom type that somehow has a perfect hash function:

class Foo
{
    private int data;

    override int GetHashCode()
    {
        return data.GetHashCode();
    }
}

An int's hash code is the int itself so I have a perfect hash function, right? But when we use the hashing function to map the objects to a hashtable by the simple formula:

index = foo.GetHashCode() % hashtable.Length

we get a variable index that depends on also how many elements we have in the hashtable. If the hashtable's size was int.MaxValue only then we will have a perfect hash function. For example lets say that we have a hashtable with size of 2. And if we hash for example the numbers 1 and 3 we get

1 % 2 = 1
3 % 2 = 1

A collision! Have I understood anything wrong about hashing and hashtables? It comes out that a perfect hash function is not perfect.

Bosak
  • 2,103
  • 4
  • 24
  • 43
  • 1
    If you can write a perfect hash function I think there's a million dollars waiting for you somewhere. – ChiefTwoPencils May 11 '13 at 20:41
  • 4
    @C.Lang A perfect hash function is easy to write when one restricts the set of hashable data. – Seth Carnegie May 11 '13 at 20:54
  • @SethCarnegie: thanks. I learned through implementations that aren't restrictive. And based on his comments in Julian's answer that's what OP was referring to. Anyway, just another thing to wrap my head around:)\ – ChiefTwoPencils May 11 '13 at 21:15

3 Answers3

7

You have it all right until this point

index = foo.GetHashCode() % hashtable.Length

Your hash function is perfect, but when you calculate the modulo, you're actually using a different hash function. In this case, your hash function int.GetHashCode is perfect, but your data structure using foo.GetHashCode() % hashtable.Length is not. That is, one thing is the hash of your objects, and a different thing is the hash used by the structure holding those objects.

For your data structure to be perfect too, its maximum size must also be the number of ints.

So why don't we have collisions in Dictionary? Actually, we do. If two objects A and B do have the same hash in the dictionary, we have a collision. What happens is that the dictionary runs A.Equals(B) as the final check to see if the two objects actually are the same or not. If they are, you get an exception for having duplicates. If they don't, they are both kept under the same dictionary hash.

Julián Urbano
  • 8,378
  • 1
  • 30
  • 52
  • Yes I tought of it, but isn't that how hashtables are implemented? For example in C# the Dictionary. And isnt an array of length int.MaxValue too big to be effective and a lot of memory will be wasted? – Bosak May 11 '13 at 20:52
  • 2
    Yes, but Dictionaries do have collisions. What happens is that whenever you have a collision the dictionary checks the `Equals` method with the two colliding objects. That one is the final check – Julián Urbano May 11 '13 at 20:53
  • yes I know they have collisions and I know that many collision solving algorithms exist, but that is not the case. So do you know if a perfect hash function can exist(working with the modulo) so that it works for any size of the array. – Bosak May 11 '13 at 20:59
  • 1
    No, there isn't. By definition, if your modulo is n < int, then you have at most n slots in your dictionary. No matter what you do, you have at most n, so there are always collisions. Even if you don't work with modulos: if you have n < int slots, you'll always have collisions. – Julián Urbano May 11 '13 at 21:05
3
  1. Yes! (as said, by definition)

  2. Where do you get a p.h.f from in the first place? You want to hash a fixed, i.e. constant set S of different (i.e. no multiset) values to the set 1..|S|, bijectively. Apparently then, the p.h.f depends on the set S.

  3. Also, remove a single element from S, and add another one, you almost surely get a collision (of the new element with an old one).

  4. So, you actually want "a p.h.f. for such-and-such well defined/described set". And then we can try to find one.

Michael
  • 63
  • 5
2

Yes, a perfect hash function is guaranteed not to have collisions.

That's its very definition!

From Wikipedia (http://en.wikipedia.org/wiki/Perfect_hash_function)

A perfect hash function for a set S is a hash function that maps distinct elements in S to a set of integers, with no collisions. A perfect hash function has many of the same applications as other hash functions, but with the advantage that no collision resolution has to be implemented

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • yes, but it actually has collisions see the example I made. This is a function that gives a one to one relationship. The problem comes not in the hash function, but in the way index is calculated( `i = hash(obj) % length` ) – Bosak May 11 '13 at 20:48
  • 3
    Then the example you made isn't a perfect hash function. However, the answer to your question "Does a perfect hash function guarantee no collisions?" has to be "yes" because, as I pointed out, that is the very definition of a perfect hash function. – Matthew Watson May 11 '13 at 22:34