0

I'm trying to parse using Python Lark the output of a plan from a database. Here's my grammar:

start: op

op: command "(" op ")" "[" text "]"  
    | command "(" op ")" "[" text "]" WORD
    | command "(" op ")" "[" text "]" "," op 
    | command "(" op ")" "[" text "]" WORD "," op 
    | simple 

command: WORD | WORD WORD -> command

simple: /[A-Za-z"._]+/

text : /[A-Za-z0-9=.!%,\-" _:]+/ 

%import common.WORD
%import common.ESCAPED_STRING
%import common.WS
%ignore WS

Not the best grammar in the world, but it works for simple things. The problem is, sometimes there are parentheses and brackets INSIDE the text, that don't really matter. If I add them to the regex text rule, it messes up the op rule. Is there a simple way to fix this issue, or do I have to add complicated rules about the text?

  • There are two ways to do this. One way is to have a maximum length matching pattern. But I haven't seen much on how to perform a maximum length match with the Python 're' module, which is what Lark uses. There is another Python regular expression engine however. The other way is to require escaping of the brackets contained in `text`. – kaby76 Oct 10 '21 at 19:26
  • The problem is ambiguity on where the text terminals ends. If performance is not the biggest concern, [this](https://github.com/lark-parser/lark/blob/master/examples/advanced/dynamic_complete.py) is an example on how to get the correct parse for sure. Otherwise you need to be more specific on what the rules are. – MegaIng Oct 10 '21 at 22:47

0 Answers0