-1

I am confused about the relation between KMP (Knuth–Morris–Pratt) and Regex (DFA-based) Searching.

My thought is that KMP cannot use regex notations (e.g., (A|B){2}C), so it can only search for a "single" string (e.g., AC or BC, but not AC|BC). Is this true?

Another question, if the pattern is a single string (ABABAC), are they essentially using the same?

JackWM
  • 10,085
  • 22
  • 65
  • 92
  • As a matter of fact, Java reference implementation of Pattern class uses a modification of Boyer-Moore algorithm when the pattern is a fixed string. – nhahtdh Jun 26 '15 at 03:20

3 Answers3

0

In fact there is a generalized form of KMP that is a FA (aho-corasick algorithm). It's also easy to use a wildcard. IMO you can uses regular expression with kmp but it's not so easy.

Micromega
  • 12,486
  • 7
  • 35
  • 72
0

It seems(95% sure) both algorithms should do exactly the same thing since step of moving from position i in the string to back to the end of a prefix at position p will be the same as a non-deterministic automaton being in both the states, the one that is right after the prefix, p, and the one further into the string at position i. Once converted into dfa this automaton will have one state that will simulate the NFA and it will finish in linear time. So that regex with kleene star is equivalent to KMP.

I. Cantrell
  • 170
  • 1
  • 7
-1

KMP cannot use regex notations, so it can only search for a "single" string. Is this true?

Yes. KMP is a string search algorithm, not a pattern matching algorithm.

Another question, if the pattern is a single string (ABABAC), are they essentially using the same?

No, DFA-based matching is not equivalent to the KMP algorithm. It is however possible that advanced regex match implementations employ KMP as an optimisation.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Thanks! Why KMP is faster? For traversing the to-search string, they both are O(N), right? – JackWM Jun 25 '15 at 23:00
  • I assume you read the linked articles? Yes, they're both `O(N)`, a string search cannot get better than that for arbitrary inputs, but still a) an implementation can be faster b) it has a different worst case boundary – Bergi Jun 25 '15 at 23:12