3

For example:

Source String is: "Mac and Jack are friends" Pattern String is : "are".

So that it looks like that always a matching of pattern will start from 0th index. and in the source string will move character by character.

So it seems like it should have the complexity of O(mn). In general I can say that KMP should have worst case complexity of O(mn), but I read that using KMP we can solve subtring matching algorithm in O(m+n), So curious to know abt that worst case analysis.

Girish
  • 1,717
  • 1
  • 18
  • 30
  • 1
    That's the best case for KMP when all characters are unique, it would be at most 1 "jump" for each character of the text – DAle Jul 04 '17 at 07:31
  • Reference : [String Algorithms Stanford](https://web.stanford.edu/class/cs97si/10-string-algorithms.pdf) – รยקคгรђשค Jul 05 '17 at 17:57
  • Possible duplicate of [What's the worst case complexity for KMP when the goal is to find all occurrences of a certain string?](https://stackoverflow.com/questions/9182651/whats-the-worst-case-complexity-for-kmp-when-the-goal-is-to-find-all-occurrence) – รยקคгรђשค Jul 05 '17 at 17:58

1 Answers1

0

I was thinking about this a lot as well. Here's what I've concluded. (Let's say n is the length of the string to search and m is the length of the pattern)

In the naive brute force solution of string matching, the only reason you need to iterate over all n for a given m is if there are repeats

For example:

string: abcdabcdabcd
pattern:abcde

Iteration 1:

string: abcdabcdabcd
        ^
pattern:abcde
        ^

Iteration m

string: abcdabcdabcd
            ^
pattern:abcde
            ^

mismatch! so on iteration m+1, we do:

string: abcdabcdabcd
         ^
pattern:abcde
        ^

Now in the case of KMP, on iteration m+1, we don't need to reset the string pointer so far back because because if the character at position 2 on the string (1-based indexing) did indeed match the pattern, then the pattern would have duplicate characters in a row.

KMP iteration m + 1, pattern has all distinct characters

string: abcdabcdabcd
            ^
pattern:abcde
        ^

If there are repeats, then on iteration m+1, then we don't reset the pointer on the pattern as far:

KMP iteration m + 1, pattern has runs of characters

string: aaaac
            ^
pattern:aaaab
           ^
zzzzzzz
  • 175
  • 1
  • 5