Is rabin-karp string search algorithm still correct if we neglect the modulo part and let hash int/long overflow?

Question

I have a question: if we let rolling hash overflow, does it affect the correctness of Rabin-Karp algorithm? Could you give a solid example that the overflow indeed will affect correctness?

That is something like same string e.g. "abcd" will give different hash values when you directly compute from "abcd" or from "eabcd" (hash("eabc") - hash("e") * R^3) * R + hash("d")

hash("abcd") != (hash("eabc") - hash("e") * R^3) * R + hash("d") if we allow int/long overflow

There isn't really a "standard" hash function to use with Rabin-Karp, and since your question is really about one specific hash function, you should specify which hash function that is. — Matt Timmermans, Jul 15 '20 at 00:32

score 1 · Accepted Answer · answered Jul 17 '20 at 23:35

In the case of using unsigned integers for rolling hash, unsigned overflow is equivalent to modding by 2^32 or 2^64, depending on the size of the unsigned type. So the answer to your question is yes, the algorithm will still be correct. (As an exercise, think about why will unsigned overflow be equivalent to modding?)

In fact, you will see in many speedy implementations, they don't explicitly use modulo operations and use unsigned overflow as an implicit modulo operation for speed; as an example, see the sample implementation in C by Charras and Lecroq: https://www-igm.univ-mlv.fr/~lecroq/string/node5.html

Still, the modulo operation is retained in pseudocode presentation simply because it is best to make such an operation explicit when presenting the algorithm for both ease of understanding and attention to detail.

score 0 · Answer 2 · answered Jul 14 '20 at 23:57

0

I don't think it will affect the correctness of the algorithm, since two equal inputs will return the same output when submitted to the same function. As the rolling hash adds and subtracts elements, it shouldn't affect each individual result, even if it overflows.

answered Jul 14 '20 at 23:57

Daniel

7,357
7
32
84

Therefore why the standard Rabin-Karp will require modulo operation to prevent overflow? The frequent modulo operations are quite expensive. – maplemaple Jul 15 '20 at 00:05
Just because of a preference to deal with non-negative numbers. Sometimes the values are used to be indexes of an array and you can't have negative indexes. It depends on the case. If it is exclusively for string matching, allowing the hash to overflow seems acceptable. – Daniel Jul 15 '20 at 00:17

Is rabin-karp string search algorithm still correct if we neglect the modulo part and let hash int/long overflow?

2 Answers2