0

Hi all I have a big problem with my hash function. I try to explain my problem :

I have a set of char and I want to do an hash function because I want to change the set with hash set, for each char I have a index , so what I do now :

pair --> index p = 1 index a = 2 index i = 3 index r= 4---> so my hash return 1234

but if for example I have

so --> index s = 12 index o = 34 ---> hash 1234

COLLISION!!!!

P.S. : I cannot order my char in alphabetic number....

So , is there anyone that can help me?? THANKS A LOT :)

  • This is why you shouldn't design your own hash. What's wrong with using existing widely-used ones, like md5/sha1? – Marc B Apr 17 '13 at 15:21
  • Commonly the chars are multiplied with some good chosed numbers which often gets XORed, but you do nothing like that... Typical multiple mathematical experts needs month to develope a hash algorithm. – rekire Apr 17 '13 at 15:22
  • Because I have also integers numbers in my set and if I have to do to_string and then pass the value to md5/sha1 is too much expensive :( – Leonardo Rania Apr 17 '13 at 15:25
  • Building a hash is more expensive than converting. Depending on your data structure you could simply hash the memory of your structure. So you don't need to convert anything. – rekire Apr 17 '13 at 15:30
  • No no trust me...building a hash is more expensive only for the brain :) I tried to use md5/sha1/superFast... – Leonardo Rania Apr 17 '13 at 15:33
  • Try a bit rotation together with XOR the next byte. That would result less collisions, but that will produce some. By the way if you don't be an expert or a PhD i won't trust you ;-) – rekire Apr 17 '13 at 15:37
  • Yes,I didn't say that is always expensive but in my case yes :) however I'm doing a master thesis... – Leonardo Rania Apr 17 '13 at 15:44
  • In this case I'm just a little step ahead I've completed mine. – rekire Apr 17 '13 at 15:48
  • good :) I hope to reach you as soon as possible :D – Leonardo Rania Apr 17 '13 at 15:51
  • xor is not good for me ..... my hash function now is : h = join(h,CHAR.getIndex()+1) – Leonardo Rania Apr 17 '13 at 15:52
  • Which programming language you using? XOR should be fast AFIK XOR is also a CPU command. – rekire Apr 17 '13 at 15:58
  • c++ .... sure xor is fast but not in my case I have to found a number to sum or multiply or other operation to CHAR.getIndex() to resolve the collisions ....but my hash function must be h = join(h,CHAR.getIndex()).... – Leonardo Rania Apr 17 '13 at 16:02

1 Answers1

0

You could try the string hashing function of Java. This is my C# port which should be simple ported to c++:

int javaHash(String txt) {
    uint h = 0;
    if(txt.Length > 0) {
        for(int i = 0; i < txt.Length; i++) {
            h = 31 * h + txt[i];
        }
    }
    return (int)h;
}
rekire
  • 47,260
  • 30
  • 167
  • 264