3

I have a question about choosing the q and d in Rabin-Karp algorithm for searching strings. The algorithm uses the values q as modulus and d as hash function.

If I choose q as number of power 2 and d=q-1 or d=q+1

  • How can these choices affect spurious hits?

  • What patterns will be sure spurious hits in case d=q+1 and in case d=q-1?

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
prilia
  • 996
  • 5
  • 18
  • 41

1 Answers1

1

X mod (2^n +1) can be described as alternating sum / difference e.g.

X= 0x11223344, n=8, X mod 257 == 0x44 - 0x33 + 0x22 - 0x11 mod 257. The problems here is that any repeating letter will cancel itself -- not very practical -- of course n doesn't have to be 8.

X mod (2^n -1) will be the sum of all n bit patterns, e.g.

X= 0x11223344, n=8, X mod 255 ==( 0x44 + 0x33 +0x22 +0x11 ) mod 255

The problem here is that toggling bytes cancel themselves: Hash('ab') = Hash('ba').

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57