4

I'm facing the question, whether a certain regex implementation is based on a DFA or NFA.

What are the starting points for me to figure this out. One could also ask: What am I looking for? What are the basic patterns and / or characteristics? A good and explanatory link or a little comparisons (even if not directly dedicated to regex) is perfectly fine.

Jan
  • 1,594
  • 1
  • 20
  • 30
  • 1
    Consider posting this on http://cstheory.stackexchange.com/. – Codie CodeMonkey Nov 25 '11 at 10:59
  • I think your lingo is a little backwards. NFAs have the possibility of multiple execution paths, and so they're what need backtracking. Backtracking doesn't do a DFA any good, as it can only play out one way anyway. – phs Nov 25 '11 at 11:38
  • http://lambda.uta.edu/cse5317/notes/node9.html might also be relevant to your interests. Evaluating a regular NFA will require the algorithm to hold a set of states (the back track), where a DFA evaluator will always hold exactly one automaton state. – phs Nov 25 '11 at 11:41
  • So if I get you right it should be NFA + Backtracking or DFA? – Jan Nov 25 '11 at 11:45
  • 1
    Yes, I believe so. It's been a while – phs Nov 25 '11 at 11:49
  • 2
    @DeepYellow: No, cstheory is for research-level theoretical questions. – sdcvvc Nov 25 '11 at 12:33

2 Answers2

4

If it's a black box, then give it some input and measure its time characteristics with a pathological case, with reference to the graphs in this discussion of NFS vs backtracking regex implementations. (note the NFS graph is microseconds not seconds).

Also, if it's a pure NFA, then it won't have some non-regular features which are found is some 'regular expression' parsers, which require backtracking.

Alternatively, look at the documentation of the RxParser class; documentation appears to be unavailable on the web and requires a squeak runtime to browse.

Pete Kirkham
  • 48,893
  • 5
  • 92
  • 171
2

I think you mean "regex implementation" rather than algorithm (in the usual sense).

You could test with know expressions that are known to cause problems with one approach or the other. Also looking for features that are easier to implement in one or the other (this is not a reliable approach – the developers of regex engines find new ways to implement previously hard things).

Normally the answer is to read the documentation, or look in a known reference ("Mastering Regular Expressions" documents many popular cases). Finally why not ask the authors?

Richard
  • 106,783
  • 21
  • 203
  • 265
  • I'll accept this answer, because of the obvious suggestion to ask the author. I didn't even think about that :) Pete Kirkham's answer is very valuable, too. – Jan Nov 30 '11 at 15:11