0

I am reading about hopscotch hashing
The algorithm says when we run into colision during insert:

Otherwise, j is too far from i. To create an empty entry closer to i, find an
item y whose hash value lies between i and j, but within H − 1 of j, and
whose entry lies below j. Displacing y to j creates a new empty slot closer
to i. Repeat. If no such item exists, or if the bucket already i contains H
items, resize and rehash the table.

I am not sure how this works.
Example. Assume H = 3 and that a,b,c all map to 0 and d,e map to 1
We have:

 0  1  2  3  4  5    
[a, b, c, d,  ,  ] the table with 2 slots empty    
    i       j

b,c are within H - 1 (2) places away of their position (0) in locations 1,2 of the table and d is within 2 positions from its location 1.
If I try to insert e that also maps to 1 I would start from index 4 (an empty slot found with linear probing) and would work backwards towards 1.
According to the algorithm index 3 (which has now d) is between i and j (which are 1 and 4 respectively) and within H - 1 i.e. 2 positions of j.
So we can swap and have:

 0  1  2  3  4  5    
[a, b, c,  , d ,  ] the table with 2 slots empty    
    i       j

so now the empty slot is 3 and we can insert e since it is 2 positions away from i.
But now d whose hash value maps to 1 is more than 2 positions from 1 and can no longer be found.

So how does this algorithm work?
Note: My assumption and understanding is that the hop value is only an optimization trick for not having to run the equality over all H elements belonging to the bucket and not related with the core algorithm per se.

Jim
  • 18,826
  • 34
  • 135
  • 254

1 Answers1

1

It says find an item whose hash value lies between i and j, but within H-1 of j. It doesn't say find an item whose current location lies between i and j, but within H-1 of j. The d at index 3 has a hash value of 1, which doesn't lie within 2 positions of 4.

In your example, there are no suitable slots, so the following sentence takes effect and the table should be resized and rehashed.

hobbs
  • 223,387
  • 19
  • 210
  • 288
  • So the algorithm would apply the hash function for each of the elements between i and j to find those who fall within H-1 of j? So in this case rehash d and apply modulo to get 1 to see that d in index 3 is not a candidate for swapping? – Jim Jun 11 '15 at 20:51
  • So that is why the hop info are needed? Because in that case without the hop info this rehashing is expensive isn't it (but for H elements and I guess H/2 on average) – Jim Jun 11 '15 at 20:54
  • @Jim you could do that and get correct behavior, yes. Or you could use the "hop info" bits as the article describes to figure it out without recomputing hashes, which is a lot more efficient. – hobbs Jun 11 '15 at 20:55
  • But takes more space. So the performance penalty for H rehashes is significant? Or depends on the use case? – Jim Jun 11 '15 at 20:57
  • That's up to you. A few bytes is probably worth the expense. By the way, "rehashing" is something completely different. – hobbs Jun 11 '15 at 20:58
  • 1)Why a few bytes? It is 4 bytes per key. 2) I mean re-apply the hash. You are right my misphrase – Jim Jun 11 '15 at 21:04