Currently working on a string search program for a uni module, and I've managed to implement the algorithms successfully, at least to the point where they're finding the string consistently. I implemented Boyer Moore and Rabin Karp. I also threw in a Brute Force when one of my coursemates experienced this exact problem, and realised I had the same issue - that brute force is faster than Rabin-Karp on a wordlist.
Rabin-Karp seems to be taking the most amount of time performing the rolling hash, I was curious at first if I was just having a lot of collisions, but I managed to get collisions down to 3 with a huge prime number. That added on a little bit of time I'm assuming due to the size of the prime number, but it seemed pretty clear that rolling hash was causing the problem.
This is the rolling hash section:
//hashes don't match, rehash using rolling hash to move on to next string section
if (counter < (stringLength - patternLength)) {
stringHash = (MAXCHAR *(stringHash - stringFile[counter] * hash) + stringFile[counter + patternLength]) % prime;
if (stringHash < 0) {
stringHash += prime; //when hash value is negative, make it positive
}
}
if (!found) {
counter++;
}
I wanted to try searching through a huge text file so I used the rockyou wordlist, which Boyer Moore is very happy with, and Rabin-Karp is taking under a second. Brute Force is taking less than half the time of Rabin-Karp though which to me just doesn't make sense?
Am I misunderstanding how these algorithms should be applied, or is there a problem with the rolling hash process that I'm using?