I am currently learning about pattern matching algorithms and have come across these two algorithms. I have the following general ideas:
KMP
- Compares text left-to-right
- Uses a failure array to shift intelligently
- takes O(m), where m is the length of the pattern, to compute failure array
- takes O(m), space
- takes O(n), time to search a string
BM
- Compares pattern from last character
- Uses bad character jumps and good suffix jumps
- takes O(m + size of alphabet) to compute tables
- takes O(m + size of alphabet), space
- takes O(n), but usually better to search
I came across the following question which triggered this question(True or False):
The Knuth-Morris-Pratt (KMP) algorithm is a good choice if we want to search for the same pattern repeatedly in many different texts.
So I believe the answer is true just because the assumption is that every time you run the algorithm on different text the preprocessing is only O(n) where for BM it is O(n + size of alphabet). However, I am not sure if I am making the correct assumption that every time the algorithm is rerun a new table is recomputed. Because say the text always falls in the alphabet of english. I would only need to compute the table once and just reuse the table. So at the end of the day, would the answer to this question be dependent on the fact that the algorithms are all being run on text which is contained in the same alphabet or is there some other factor which may affect it?