0

This is a prefix hashing function. i want to count the number of collisions in this method but i am not sure how to do it. It seems like it might be simple but i just cant think of a great way to do it....

int HashTable_qp::preHash(string & key, int tableSize )
{
    string pad = "AA";
    //some words in the input are less than 3 letters
    //I choose to pad the string with A because all padded characters 
    //have same ascii val, which is low, and will hopefully alter the results less
    if (key.length() < 3)
    {
        key.append(pad);
    }
    return ( key[0] + 27 * key[1] + 729 * key[2] ) % tableSize;
}
user977154
  • 1,045
  • 4
  • 19
  • 39
  • 1
    create a histogram unsigned [tablesize], generate some (all) the possible strings and compute their hashval, and update the histogram accordingly histogram[hashval] +=1; – wildplasser Dec 05 '11 at 23:48
  • @wildplasser, that would be the easiest way. My answer would be faster. If it's not performance critical I'd go with **wild's** idea. (You should probably post that as an answer, to help others find it when they find this page.) – Brigand Dec 05 '11 at 23:51

3 Answers3

1

If it's an array as the underlying data structure do: int hash = preHash(&key, array.length); if(array[hash] != null) this.count++; If it's an array of linked lists do:

if(array[hash] != null && *(array[hash]) != null)
this.count++

If you only have access to the stl library I believe just testing that element is null before adding it would be enough after calling the hash function.

  • i am using stl's, can you show me how it would look taking stl's into consideration. – user977154 Dec 06 '11 at 01:09
  • Easiest way is to inherit it and override it call: HashTable_qt::PreHash in the derived class, but instantiate the derived class and assign it a value and give the derived class a field variable called CollisionCount and before you add a new value initialize it to 0 and after call it and it should have the new value if greater than 1 the function was called with a new key – Alexander Wood Dec 06 '11 at 20:10
  • int Prehash(string & key, int tableSize ) { HashMap_qt::Prehash(key, tablesize); CollisionCount++; } in main HashTable_qt2* map = new HashMap_qt2() map.CollisionCount = 0; map.Add(...); cout << map.CollisionCount; – Alexander Wood Dec 06 '11 at 20:14
1

create a histogram:

 unsigned histogram[tablesize] = {0};

generate some (all) possible strings and compute their hashval, and update the histogram accordingly:

for(iter=0; iter < somevalue; iter++) {
 hashval = hashfunc( string.iterate(iter) ); // I don't know c++
 histogram[hashval] +=1;
 }

Now you have to analyze the hashtable for lumps / clusters. Rule of thumb is that for (tablesize==iter), you expect about 30 % cells with count =1, and about 30 % empty; the rest has two or more.

If you sum all the (count*(count+1))/2, and divide by the tablesize, you should expect around 1.5. A bad hashfunction gives higher values, a perfect hash would only have cells with count=1 (and thus: ratio=1) With linear probing you should of course never use tablesize=niter, but make tablesize bigger, say twice as big. You can use the same metric (number of probes / number of entries), to analyse its performance, though.

UPDATE: a great introduction on hashfunctions and their performance can be found at http://www.strchr.com/hash_functions .

wildplasser
  • 43,142
  • 8
  • 66
  • 109
0

You can create an array of integers, each representing one hash. When you're done making the hashes loop back through the array in a nested loop. If you had the following array,

[0] -> 13
[1] -> 5
[2] -> 12
[3] -> 7
[4] -> 5

For each item i in 0..n, check items i+1..n for matches. In English that would be: check if each element equals any of the elements after it.

Brigand
  • 84,529
  • 20
  • 165
  • 173