I want to write a expression grammar which matches strings likes these:
words at the start ONE|ANOTHER wordAtTheEnd
---------^-------- ----^----- --^--
A: alphas B: choice C: alphas
The issue is however that part A can contain the keyword "ONE" or "ANOTHER" from part B, so only the last occurrence of the choice keywords should match part B. Here an example: The string
ZERO ONE or TWO are numbers ANOTHER letsendhere
should be parsed into the fields
A: "ZERO ONE or TWO are numbers"
B: "ANOTHER"
C: "letsendhere"
With pyparsing
I tried the "stopOn
"-keyword for the OneorMore
expression:
choice = pp.Or([pp.Keyword("ONE"), pp.Keyword("OTHER")])('B')
start = pp.OneOrMore(pp.Word(pp.alphas), stopOn=choice)('A')
end = pp.Word(pp.alphas)('C')
expr = (start + choice) + end
But this does not work. For the sample string I get the ParseException
:
Expected end of text (at char 12), (line:1, col:13)
"ZERO ONE or >!<TWO are numbers ANOTHER text"
This makes sense, because stopOn
stops on the first occurrence of choice
not the last occurrence. How can I write a grammar which stops on the last occurrence instead? Maybe I need to resort to a context-sensitive grammar?