0

I was trying to create an let grammar my idea was somthing like this

start : let
let : "let" ID ("=" let)? in let | atom
atom : ANYTHING | "(" let ")"
ID : /[a-z]+/

The idea is to parse expressions like this let A = B in C or let A in B or both mixed let f = let x in x + 1 in f(1). I also want to support parenthesis to disambiguate like let A = (let b in b + 1) in A(1) + 1

I'm using lark, whith LALR parser, but I was struggling with the grammar, and can't define an unambiguous grammar for this

I tried


from lark import Lark, Transformer as LarkTransformer


grammar = """
    start : expr
    expr : LET ID (EQUAL exprcont)? IN exprcont | exprcont
    exprcont : ANYTHING | LPAR expr RPAR | expr
    ANYTHING.0 : /.+/
    LET : "let"
    IN : "in"
    ID : /[a-z_][a-z0-9_]*/
    EQUAL : "="
    LPAR.10 : "("
    RPAR.10 : ")"

    %import common.WS
    %ignore WS
"""

let_parser = Lark(grammar, parser="lalr")

print(let_parser.parse("let a = 1 in let b = 2 in a + b").pretty())

But I got lot's of reduce reduce errors

Traceback (most recent call last):
  File "/Users/gecko/code/lampycode/letparser.py", line 55, in <module>
    let_parser = Lark(grammar, parser="lalr")
  File "/Users/gecko/.pyenv/versions/lampy/lib/python3.9/site-packages/lark/lark.py", line 339, in __init__
    self.parser = self._build_parser()
  File "/Users/gecko/.pyenv/versions/lampy/lib/python3.9/site-packages/lark/lark.py", line 373, in _build_parser
    return self.parser_class(self.lexer_conf, parser_conf, options=self.options)
  File "/Users/gecko/.pyenv/versions/lampy/lib/python3.9/site-packages/lark/parser_frontends.py", line 145, in __init__
    self.parser = LALR_Parser(parser_conf, debug=debug)
  File "/Users/gecko/.pyenv/versions/lampy/lib/python3.9/site-packages/lark/parsers/lalr_parser.py", line 17, in __init__
    analysis.compute_lalr()
  File "/Users/gecko/.pyenv/versions/lampy/lib/python3.9/site-packages/lark/parsers/lalr_analysis.py", line 304, in compute_lalr
    self.compute_lalr1_states()
  File "/Users/gecko/.pyenv/versions/lampy/lib/python3.9/site-packages/lark/parsers/lalr_analysis.py", line 279, in compute_lalr1_states
    raise GrammarError('\n\n'.join(msgs))
lark.exceptions.GrammarError: Reduce/Reduce collision in Terminal('$END') between the following rules: 
    - <exprcont : expr>
    - <start : expr>

Reduce/Reduce collision in Terminal('IN') between the following rules: 
    - <expr : exprcont>
    - <expr : LET ID IN exprcont>

Reduce/Reduce collision in Terminal('RPAR') between the following rules: 
    - <expr : exprcont>
    - <expr : LET ID IN exprcont>

Reduce/Reduce collision in Terminal('$END') between the following rules: 
    - <expr : exprcont>
    - <expr : LET ID IN exprcont>

Reduce/Reduce collision in Terminal('IN') between the following rules: 
    - <expr : exprcont>
    - <expr : LET ID EQUAL exprcont IN exprcont>

Reduce/Reduce collision in Terminal('RPAR') between the following rules: 
    - <expr : exprcont>
    - <expr : LET ID EQUAL exprcont IN exprcont>

Reduce/Reduce collision in Terminal('$END') between the following rules: 
    - <expr : exprcont>

I have no idea how to define this grammar, the idea is so simple let : "let" ID ("=" let)? "in" let | atom any ideas?

geckos
  • 5,687
  • 1
  • 41
  • 53
  • Your first idea in your question is fine. Why did you chose to not use it? – rici Dec 21 '20 at 18:45
  • @rici I tried with `ANYTHING: /.+/` and anything swallows let expressions without giving a chance of lets to get parsed `Token('ANYTHING', 'let a = 1 in a')` <- here is an example – geckos Dec 21 '20 at 22:45
  • Yes, that's true. "Anything" is never a reasonable part of a grammar (since it must extend to the end of input). I assumed that you meant something different, like "any other token". But that's not the cause of the conflicts. Your `let` non-terminal has correct recursion. But the translation into `expr` is subtly different and incorrect. – rici Dec 22 '20 at 00:31
  • I tried set anything priority down so let keyword is matches first but it didn't worked – geckos Dec 22 '20 at 06:56
  • You'll find that even if you don't need a detailed parse, it is usually much simpler to define the entire lexical structure (which is not that much work), and at least enough of the syntax to roughly analyse the input. – rici Dec 22 '20 at 18:08
  • Hi @rici! Yeah, I endup enroling a recursive decendent parser and now I'm struggling to have let expression inside the ANYTHING, that were an terminal in the original grammar. I want to do this text transformation like s/let(.*)in/let(\1)in/, in another words I want put parenthesis between let and in keywords. `ANYTHING` would mean something like, any possible python expression. But as long as I want let expressions inside ANYTHING, blewed up, literally so I'm wondering if I do try to parse the python expression or came up with something simpler that would solve the problem – geckos Dec 22 '20 at 22:51
  • `in` is a Python operator, so you're going to end up with ambiguous expressions. – rici Dec 22 '20 at 23:22
  • I'm using tree transformations to transform let(A) in B in let(A)(b) which is a function that return ast for (lambda A: b) and the form let(a=b) in c -> (lambda a:c)(a=b) this works fine already but I want to remove the need for parenthesis in let() – geckos Dec 23 '20 at 13:00
  • in the parenthesised forms, `in` isn't ambiguous because it's redundant: it must be the token which follows the `)` which matches the `(` in `let (`. But if you remove the parentheses, the grammar can no longer distinguish between the two possible parses for `let present = a in b in expression`. That problem can't be fixed with operator precedence either. But I think this whole comment thread is way out of scope. Maybe there's a mire specific question you can ask. (As a new question). – rici Dec 25 '20 at 05:42

2 Answers2

1

I think the problem is

start : expr
    expr : ... | exprcont
    exprcont : ... | expr

This loop means that your grammar is ambiguous.

Can you get rid of the loop?

Frank Yellin
  • 9,127
  • 1
  • 12
  • 22
0

If you want to use a terminal like ANYTHING, don't use lalr. Use earley. (and even then it will still create problems)


But this is not actually what is creating these errors. The problem is the mutual recursion in expr and exprcont. You can just remove the expr in exprcont:

exprcont : ANYTHING | LPAR expr RPAR

But this still won't work. (Unless you use parser='earley' and lexer='dynamic_complete'. But that will be very slow). You have to redesign the grammar to not include an ANYTHING terminal.

MegaIng
  • 7,361
  • 1
  • 22
  • 35