4

Is it possible to have a perfect hash function from strings to integers, when the number of elements to be hashed is known? By perfect hash function I mean that there is no chance of collision.

Basically I am reading the signatures of multiple tables from a file (e.g. id, name, address). Different tables might have common attributes (e.g. name), but on different positions (i.e. columns). I would like to be able to ask something like: what is table1["name"]? or table2["name"].

UPDATE: I would prefer learning to do it myself than using something already out there.

user1377000
  • 1,433
  • 3
  • 17
  • 29
  • As posed this question is likely to get closed... What do you call a perfect hash - what limitations do you place on it? – Floris Mar 13 '13 at 16:38
  • 2
    Yes it: just keep track of all unique strings you've seen so far, and assign consecutive integer ids to them. Now, what's the actual problem you're trying to solve? – NPE Mar 13 '13 at 16:39
  • 2
    @Floris a perfect hash function is just that: *perfect*. There are no collisions between the input key material and the generated hash indexes. The limitations are singular: no collisions. – WhozCraig Mar 13 '13 at 16:40
  • Thank you. I have added the description of the problem. – user1377000 Mar 13 '13 at 16:44
  • 1
    Does it have to be minimal as well? Just curious. – WhozCraig Mar 13 '13 at 16:46
  • Sorry, I'm not sure what you mean. I was hoping to hash the name of the column to an array index, where the position of the column in the table would be stored. Does that answer? :) – user1377000 Mar 13 '13 at 16:49
  • @NPE: That doesn't really provide constant access. – user1377000 Mar 13 '13 at 23:27

1 Answers1

4

See GNU gperf.

GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.

Shantanu
  • 373
  • 2
  • 8
  • Hm, am I allowed to integrate it into my project? – user1377000 Mar 13 '13 at 16:45
  • Indeed. And being a GNU project, it is properly-maintained. Use %language=C++ if you wish to use the output code in C++. Ref: [Gperf Declarations](http://www.gnu.org/software/gperf/manual/gperf.html#Gperf-Declarations). Just beware that the resulting output code is [GNU GPL](http://www.gnu.org/software/gperf/manual/gperf.html#Output-Copyright) – Shantanu Mar 13 '13 at 16:51
  • Thank you. I would prefer doing it myself though, just to learn how I might be able to achieve it, given the above problem description. – user1377000 Mar 13 '13 at 17:17
  • 1
    It's unclear that the the *output* code is GPL. The GNU/gperf folks don't seem to think so. They note that "gperf is under GPL, but that does not cause the output produced by gperf to be under GPL." Check out http://www.gnu.org/software/gperf/manual/gperf.html#Output-Copyright for a more complete answer. – Nik Bougalis Mar 13 '13 at 18:06