1

I have build a grammar for propositional formulas, however I have found that it is ambiguous, i.e., there is more than one derivation tree for my input formulas.

This is the smallest ambiguous grammar I have found

_PL_grammar = (
    r"""
    ?start: formula
    ?formula: "(" formula ")"
            | logical_expr
            | ident 
    ?logical_expr: and_expr
    ?and_expr:  formula ("&&" | "&") formula
    ident: CNAME
    %import common.CNAME
    %import common.WS_INLINE
    %ignore WS_INLINE
    """)

I am executing this piece of code with lark==1.1.2

import lark
def parse_formula(formula):
    logic_parser = lark.Lark(_PL_grammar )
    parsed_formula = logic_parser.parse(formula)
    return parsed_formula
parsed_formula = parse_formula('a && b && c')
print(parsed_formula)

Which returns arbitrary one of these two outputs:

Tree('and_expr', [Tree('and_expr', [Tree('ident', [Token('CNAME', 'a')]), Tree('ident', [Token('CNAME', 'b')])]), Tree('ident', [Token('CNAME', 'c')])])

or

Tree('and_expr', [Tree('ident', [Token('CNAME', 'a')]), Tree('and_expr', [Tree('ident', [Token('CNAME', 'b')]), Tree('ident', [Token('CNAME', 'c')])])])

Which would correspond to formulas ((a && b) && c) and (a && (b && c)) respectively

How can I remove the ambiguity of this grammar? In other words, is there a way to make an input always have only one unique output?

djimenez
  • 41
  • 3
  • Which of those two parses do you want? – rici Mar 15 '22 at 14:10
  • @rici I don't care, but let's say that the first `((a && b) && c)` – djimenez Mar 15 '22 at 14:26
  • In that case, the right-hand operand of `&&` cannot itself be an `and_expr`. You need to write the grammar in cascading precedence style, like any of the example algebraic grammars you have undoubtedly seen. (Such as [this one](https://github.com/lark-parser/lark/blob/master/examples/calc.py).) – rici Mar 16 '22 at 03:04

0 Answers0