0

I'm making a programming language using Lark, and I'm trying to parse multiple statements from a file. When I parse

print("HI");
print("HI");

It returns

Tree('start', ['HI', HI'])

But when I parse

print("Hi");

It returns

Hi

Heres what my grammar somewhat looks like

?start: expr
      | statement*
?expr: STRING -> string
?statement: "print" "(" expr ")" ";" -> print_statement

%import common.ESCAPED_STRING -> STRING 
%declare _INDENT _DEDENT
%import common.WS_INLINE
%ignore WS_INLINE
%import common.NEWLINE -> _NL
%ignore _NL

And heres how my transformer file works

from lark import Transformer, v_args
class MainTransformer(Transformer):
  string = str
  def print_statement(self, value):
    value = str(value).strip('"')
    return value

And heres how my indenter code works

class MainIndenter(Indenter):
    NL_type = '_NL'
    OPEN_PAREN_types = ['LPAR', 'LBRACE']
    CLOSE_PAREN_types = ['RPAR', 'RBRACE']
    INDENT_TYPE = '_INDENT'
    DEDENT_type = '_DEDENT'
    tab_len = 8

And heres my main.py file

from lark import Lark
from transformer import MainTransformer
from indenter import MainIndenter
parser = Lark.open("main_parser.lark", parser="lalr", transformer=MainTransformer(), postlex=MainIndenter())
main_parser = parser.parse

input_str = '''
print("HI");
print("HI");
'''
print(main_parser(input_str))

Help would be appreciated, thanks!

  • 1
    Making progress. It’s probably an optimisation in Lark that if there’s a single result it’s unwrapped. Try putting a transformer on `statement*` which turns the multiple values into whatever you want to be thr result. That does go back to something I mentioned on an earlier question of yours: you need to decide what you want your parser/transformer to have as a result. Second thing is please can you combine your snippets into one single copy/paste-able segment of code so it’s easy for anyone to paste into a single file and run - make it easy to help. – DisappointedByUnaccountableMod Nov 03 '20 at 23:21
  • Alright, I'll do that, thanks! – Rayyan Cyclegar Nov 03 '20 at 23:33
  • So I make a transformer for my `statememt*`? – Rayyan Cyclegar Nov 03 '20 at 23:34
  • What I want it to return is `HI` if i do `print("HI")'` two times in the file – Rayyan Cyclegar Nov 03 '20 at 23:35
  • There are probably many ways of doing it but at least if you do that on `statement*` then those will be processed. Are you really saying you only want one HI for two print statements - not sure how to do that, doesn’t sound right/obvious. – DisappointedByUnaccountableMod Nov 03 '20 at 23:48
  • No, I want one HI for each print statement – Rayyan Cyclegar Nov 03 '20 at 23:55
  • Phew. OK so add your transformer for statement* and you should be able to do that. – DisappointedByUnaccountableMod Nov 03 '20 at 23:56

1 Answers1

1

I had a play with this, which would have been a whole lot easier and 15 minutes quicker for me if you put a complete minimal reproducible example (mre) in your question - please do that next time, because I don't intend in the future to spend 15 minutes recreating somthing that should be in your question. In particular make it a single block of code complete with all needed imports and if you feel the urge to split it with blocks of text please resist and use Python comments instead in place of that text

So here's a free mre.

One thing is I didn't get the same result for a single print statement "HI" - I got a token with a STRING value "HI".

First I added the -> statements

Then I removed the string = str because that just isn't right: it's converting values (which is always a list) into a literal of a list as a string.

Then I added a string() transformer and the statements() transformer. making them print their input and return values makes it a bit easier to see what's going on. On the project I used Lark on I kept these prints identifying the transformer function and input/output in until I'd got it all stable+working - so I could e.g. check that an identifier was correctly being transformed to a URI, or the two values for an addition are being added and a single value returned.

The transformer function take value which is always a list and for a unary operator like print or string return the first (only) item in it. An operator which takes two inputs would get the two things to add as two entries in value, add them, and return the result; that's transformation.

A Token is a str with metadata, you can see the results when you run this code.

from lark import Lark
from lark.indenter import Indenter
from lark import Transformer, v_args

grammar = """
?start: expr
      | statement* -> statements // ADDED
?expr: STRING -> string
?statement: "print" "(" expr ")" ";" -> print_statement

%import common.ESCAPED_STRING -> STRING 
%declare _INDENT _DEDENT
%import common.WS_INLINE
%ignore WS_INLINE
%import common.NEWLINE -> _NL
%ignore _NL
"""

class MainIndenter(Indenter):
    NL_type = '_NL'
    OPEN_PAREN_types = ['LPAR', 'LBRACE']
    CLOSE_PAREN_types = ['RPAR', 'RBRACE']
    INDENT_TYPE = '_INDENT'
    DEDENT_type = '_DEDENT'
    tab_len = 8


class MainTransformer(Transformer):
#    string = str # REMOVED
    def string(self,value): # ADDED
        print( f"string {value=}" )
        res = value[0]  # this seems like quite a common thing to do for a unary thing like string - return value[0]
        print( f"string returning {res}" )
        return res
    
    def print_statement(self, value):
        print( f"print_statement {value=}" )
#        value = str(value).strip('"')
        res = value[0]  # this seems like quite a common thing to do for a unary thing like print - return value[0]
        print( f"print_statement returning {res}" )
        return res
        
    def statements(self,value): # ADDED
        print( f"statements {value=}" )
        for i,v in enumerate(value):
            print( f"  {i=} {v=}" )
        return value

parser = Lark(grammar, parser="lalr", transformer=MainTransformer(), postlex=MainIndenter())

main_parser = parser.parse

hiho_input_str = '''
print("HI");
print("HO");
print("HI");
print("HO");
'''

hihoresult = main_parser(hiho_input_str)
print( "hiho result=")
for i,hiho in enumerate(hihoresult):
    print(f"  {i} {hiho}")
print()

hi_input_str = '''
print("HI");
'''

print("Hi result=",main_parser(hi_input_str))

Results:

string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
string value=[Token('STRING', '"HO"')]
string returning "HO"
print_statement value=[Token('STRING', '"HO"')]
print_statement returning "HO"
string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
string value=[Token('STRING', '"HO"')]
string returning "HO"
print_statement value=[Token('STRING', '"HO"')]
print_statement returning "HO"
statements value=[Token('STRING', '"HI"'), Token('STRING', '"HO"'), Token('STRING', '"HI"'), Token('STRING', '"HO"')]
  i=0 v=Token('STRING', '"HI"')
  i=1 v=Token('STRING', '"HO"')
  i=2 v=Token('STRING', '"HI"')
  i=3 v=Token('STRING', '"HO"')
hiho result=
  0 "HI"
  1 "HO"
  2 "HI"
  3 "HO"

string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
statements value=[Token('STRING', '"HI"')]
  i=0 v=Token('STRING', '"HI"')
Hi result= [Token('STRING', '"HI"')]

If you might want to change what a string returns, do it first because that change ripples up through the items in the transformer.