everyone. So, I'm parsing a shell output (mocked here) and I'm running into an error where I really don't expect. Minimum reproducible, working example is below:
from rich import print as rprint
import typing as tp
from lark import Lark, Transformer, Tree
from lark.indenter import Indenter
class TreeIndenter(Indenter):
NL_type = '_NL'
OPEN_PAREN_types: tp.List = []
CLOSE_PAREN_types: tp.List = []
INDENT_type = '_INDENT'
DEDENT_type = '_DEDENT'
tab_len = 8
@property
def always_accept(self):
return (self.NL_type,)
kwargs = {
"parser": "lalr",
"postlex": TreeIndenter(),
"maybe_placeholders": False,
}
text = """
=======================================================================
SKIT SEASON EPISODE CAST NUMBER
=======================================================================
skit_name=vikings 3 10 3
skit_name=parrot 2 5 2
skit_name=eel 1 7 2
"""
grammar = r"""
start: [_NL] header data
header: line_break column_names line_break
data: data_line+
data_line: me_info (STRING2 | STRING)* _NL
me_info: "skit_name="STRING
line_break: "="* _NL
column_names: (STRING | STRING2)* _NL
STRING2 : STRING " " STRING
STRING : ESCAPED_STRING | VALUE
VALUE : ("_" | LETTER | DIGIT | "-" | "[]" | "/" | "." | ":")+
%import common.ESCAPED_STRING
%import common.LETTER
%import common.DIGIT
%ignore / /
_NL: /(\r?\n[\t ]*)+/
"""
parser = Lark(grammar=grammar)
rprint(parser.parse(text))
This outputs the correct tree. Do note that kwargs
isn't being used.
However, as I'd need to combine it with parser for output that is indented, I'd need to use an Indenter and the listed kwargs. When I include them, I get the following error (full trace omitted):
UnexpectedToken: Unexpected token Token('STRING', 'skit_name') at line 5, column 1.
Expected one of:
* __ANON_0
Meaning that the first line that forms data causes the problem, but it's not obvious what is actually expected.
However, interesting thing, if the first line break is omitted (both from the text and grammar) it successfully parses.
Additionally, it seems that the error occurs when either parser or postlex are included, and it's the same error, no matter which of them is included in kwargs.
EDIT: So I was hoping I can come up with a workaround for the indent and not use parser or postlex keywords, but it seems that specifying lalr parser is required to use the Transformer, so I will need to use that anyways so I can't just side-step the problem.