I had a play with this, which would have been a whole lot easier and 15 minutes quicker for me if you put a complete minimal reproducible example (mre) in your question - please do that next time, because I don't intend in the future to spend 15 minutes recreating somthing that should be in your question. In particular make it a single block of code complete with all needed imports and if you feel the urge to split it with blocks of text please resist and use Python comments instead in place of that text
So here's a free mre.
One thing is I didn't get the same result for a single print statement "HI" - I got a token with a STRING value "HI".
First I added the -> statements
Then I removed the string = str
because that just isn't right: it's converting values (which is always a list) into a literal of a list as a string.
Then I added a string()
transformer and the statements()
transformer. making them print their input and return values makes it a bit easier to see what's going on. On the project I used Lark on I kept these prints identifying the transformer function and input/output in until I'd got it all stable+working - so I could e.g. check that an identifier was correctly being transformed to a URI, or the two values for an addition are being added and a single value returned.
The transformer function take value
which is always a list and for a unary operator like print or string return the first (only) item in it. An operator which takes two inputs would get the two things to add as two entries in value
, add them, and return the result; that's transformation.
A Token is a str with metadata, you can see the results when you run this code.
from lark import Lark
from lark.indenter import Indenter
from lark import Transformer, v_args
grammar = """
?start: expr
| statement* -> statements // ADDED
?expr: STRING -> string
?statement: "print" "(" expr ")" ";" -> print_statement
%import common.ESCAPED_STRING -> STRING
%declare _INDENT _DEDENT
%import common.WS_INLINE
%ignore WS_INLINE
%import common.NEWLINE -> _NL
%ignore _NL
"""
class MainIndenter(Indenter):
NL_type = '_NL'
OPEN_PAREN_types = ['LPAR', 'LBRACE']
CLOSE_PAREN_types = ['RPAR', 'RBRACE']
INDENT_TYPE = '_INDENT'
DEDENT_type = '_DEDENT'
tab_len = 8
class MainTransformer(Transformer):
# string = str # REMOVED
def string(self,value): # ADDED
print( f"string {value=}" )
res = value[0] # this seems like quite a common thing to do for a unary thing like string - return value[0]
print( f"string returning {res}" )
return res
def print_statement(self, value):
print( f"print_statement {value=}" )
# value = str(value).strip('"')
res = value[0] # this seems like quite a common thing to do for a unary thing like print - return value[0]
print( f"print_statement returning {res}" )
return res
def statements(self,value): # ADDED
print( f"statements {value=}" )
for i,v in enumerate(value):
print( f" {i=} {v=}" )
return value
parser = Lark(grammar, parser="lalr", transformer=MainTransformer(), postlex=MainIndenter())
main_parser = parser.parse
hiho_input_str = '''
print("HI");
print("HO");
print("HI");
print("HO");
'''
hihoresult = main_parser(hiho_input_str)
print( "hiho result=")
for i,hiho in enumerate(hihoresult):
print(f" {i} {hiho}")
print()
hi_input_str = '''
print("HI");
'''
print("Hi result=",main_parser(hi_input_str))
Results:
string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
string value=[Token('STRING', '"HO"')]
string returning "HO"
print_statement value=[Token('STRING', '"HO"')]
print_statement returning "HO"
string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
string value=[Token('STRING', '"HO"')]
string returning "HO"
print_statement value=[Token('STRING', '"HO"')]
print_statement returning "HO"
statements value=[Token('STRING', '"HI"'), Token('STRING', '"HO"'), Token('STRING', '"HI"'), Token('STRING', '"HO"')]
i=0 v=Token('STRING', '"HI"')
i=1 v=Token('STRING', '"HO"')
i=2 v=Token('STRING', '"HI"')
i=3 v=Token('STRING', '"HO"')
hiho result=
0 "HI"
1 "HO"
2 "HI"
3 "HO"
string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
statements value=[Token('STRING', '"HI"')]
i=0 v=Token('STRING', '"HI"')
Hi result= [Token('STRING', '"HI"')]
If you might want to change what a string returns, do it first because that change ripples up through the items in the transformer.