I am trying to implement the Rabin-Karp for finding the substring; and I got stuck at the rolling hash(trying to use the formula suggested in Wikipedia).
#define MOD 1000000007
unsigned long long rolling_hash(const char *str)
{
unsigned long long hash = 0;
size_t str_len = strlen(str);
for(int i = 0, k = str_len -1; i < str_len; i++, k--) {
hash = hash + str[i] * pow(257, k);
// hash = hash % MOD;
}
return hash;
}
int main(void)
{
printf("%llu\n", rolling_hash("TestString"));
printf("%llu\n", rolling_hash("estStringh"));
unsigned long long old = rolling_hash("TestString");
// Add a character to the end
// since the last char in old was multiplied by 1, now multiply it by
// the base and then add the _new_ character to the end
old = old * 257 + 'h';
//old = old % MOD;
// Remove a char from the start
// Simply, remove the hash value of the first character
old = old - 'T' * pow(257, 10);;
printf("\n%llu\n", old);
return 0;
}
The code above works perfectly fine as long as I do not introduce any remainder operations; once I uncomment my %
operations, things break down and the answer I get from the changes over the rolling hash won't equal that which's being printed by the second print.
janisz's answer:
The suggestion to change the hash generator as in janisz's answer got the remainder to work when adding new characters but NOT when removing the old ones.
Note: I am using my own pow
function to work with unsigned long long