I was thinking about this a lot as well. Here's what I've concluded. (Let's say n
is the length of the string to search and m
is the length of the pattern)
In the naive brute force solution of string matching, the only reason you need to iterate over all n
for a given m
is if there are repeats
For example:
string: abcdabcdabcd
pattern:abcde
Iteration 1:
string: abcdabcdabcd
^
pattern:abcde
^
Iteration m
string: abcdabcdabcd
^
pattern:abcde
^
mismatch! so on iteration m+1
, we do:
string: abcdabcdabcd
^
pattern:abcde
^
Now in the case of KMP, on iteration m+1
, we don't need to reset the string pointer so far back because because if the character at position 2 on the string (1-based indexing) did indeed match the pattern, then the pattern would have duplicate characters in a row.
KMP iteration m + 1, pattern has all distinct characters
string: abcdabcdabcd
^
pattern:abcde
^
If there are repeats, then on iteration m+1
, then we don't reset the pointer on the pattern as far:
KMP iteration m + 1, pattern has runs of characters
string: aaaac
^
pattern:aaaab
^