1

I have an unlimited sequence of strings and numerous regular expressions ordered by priorities. For each string in a sequence I have to to find the first matching regular expression and the matched substring. Strings are not very long (<1Kb) while the number of regular expressions may vary from hundreds to thousands.

I'm looking for a Java tool that would do this job efficiently. I guess the technique should be building DFA ahead.

My current option is JFLEX. The problem I can't workaround in JFLEX is that its rules have no priorities and JFLEX looks for the rule matching the longest part of text.

My question is whether my problem could be solved with JFLEX? If not, can you suggest another Java tool/technique that would do?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
depthofreality
  • 579
  • 6
  • 14

1 Answers1

3

You could use Java regexp's. Build up the alternatives into a RE string with each alternative surrounded with '(' and ')+?' and separated by '|', with the highest priority REs first. The first construct makes the sub-REs greedy so they won't backtrack and '|' alternatives are evaluated left-to-right so the highest priority REs will be tried first.

For example, given a string of "zeroonetwothreefour"

'(one)+?|(onetwo)+?' will match 'one'
'(onetwo)+?|(one)+?' will match 'onetwo'
'(twothree)+?|(onetwothree)+?' will match 'twothree'

Note especially that in the last example, 'twothree' matches even though it occurs later in the target string and is shorter than the 'onetwothree' match.

Helder Pereira
  • 5,522
  • 2
  • 35
  • 52
Alan Burlison
  • 1,022
  • 1
  • 9
  • 16
  • An excellent idea combining greedy with | to make a 'priority search' using regular expressions! – George Aug 31 '12 at 11:06