0

according to http://java-bytes.blogspot.com/2009/10/hashcode-of-string-in-java.html: "First off, its a known fact that there is no perfect hashing algorithm, for which there are no collisions."

The author is talking practically and not theoretically right? Because theoretically, here is a perfect hash function: "for a given object, assign it a new number". There are an infinite amount of numbers, so we'll always have something to assign to an object that's unique. In practice this isn't feasible though because we have a limited amount of memory.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
HukeLau_DABA
  • 2,384
  • 6
  • 33
  • 51
  • 2
    How does your "perfect" function determine what a "new number" is? – Kenneth K. Apr 12 '13 at 18:43
  • 1
    Except that in Java ( seems to be your language) a hash is an integer, which can only take 2^32 values. – assylias Apr 12 '13 at 18:43
  • kenneth, this wouldnt be possible on a computer for reasons ive explained, but theoretically it would be the previous hashcode/number + 1 – HukeLau_DABA Apr 12 '13 at 18:55
  • 2
    @user1040923: “For a given object, assign it a new number” is not a hash function because it is not a function because if I give it the same object twice, it returns different numbers. A function must return the same value given the same input. For your proposed function to do this, it must recognize an input as equal to a previous input and provide that previous input’s number. Doing that requires remembering and looking up previous inputs. And even if you implement that, it makes the hash function dependent on the order in which inputs are presented, which breaks some applications of hashes. – Eric Postpischil Apr 12 '13 at 19:32
  • 1
    @EricPostpischil- I think the OP was talking about a theoretical function where in advance, you line up all possible inputs and just assign each one the next number in the sequence. That actually *would* be a valid hash function if the input domain was finite. – templatetypedef Apr 12 '13 at 19:34
  • This question is more suitable for http://programmers.stackexchange.com/. – Lundin Apr 19 '13 at 06:07

1 Answers1

11

Typically, a hash function maps from one set of objects (the universe) to a smaller set of objects (the codomain). Commonly, the universe is an infinite set, such as the set of all strings or the set of all numbers, and the codomain is a finite set, such as the set of all 512-bit strings, or the set of all numbers between 0 and some number k, etc. In Java, the hashCode function on objects has a codomain of values that can be represented by an int, which is all 32-bit integers.

I believe that what the author is talking about when they say "there is no perfect hash function" is that there is no possible way to map the infinite set of all strings into the set of all 32-bit integers without having at least one collision. In fact, if you pick 232 + 1 different strings, you're guaranteed to have at least one collision.

Your argument - couldn't we just assign each object a different hash code? - makes the implicit assumption that the codomain of the hash function is infinite. For example, if you were to try this approach to build a hash function for strings, the codomain of the hash function would have to be at least as large as the set of all possible natural numbers, since there are infinitely many strings. Most programming languages don't support hash codes that work this way, though you're correct that in theory this would work. Of course, someone might object and say that this doesn't count as a valid hash function, since typically hash functions have finite codomains.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065