1

I am trying to build a Boolean logic parser e.g. A == B AND C == D to output something like And(Equals(A,B), Equals(C,D))

My parser has the following definitions:

def program: Parser[Operator] = {
    phrase(operator)
}
def operator: PackratParser[Operator] = {
    leaf | node
}
def node: PackratParser[Operator] = {
    and | or 
}
def leaf: PackratParser[Operator] = {
    equal | greater | less
}
def and: PackratParser[Operator] = {
    (operator ~ ANDT() ~ operator) ^^ {
      case left ~ _ ~ right => And(left, right)}
}

I would expect the parser to map to program -> operator -> node -> and -> operator (left) -> leaf -> equal -> operator (right) -> leaf -> equal. This doesn't work. However if in the above code I do the changes

def operatorWithParens: PackratParser[Operator] = {
    lparen ~> (operator | operatorWithParens) <~ rparen
}

and change and to be

def and: PackratParser[Operator] = {
    (operatorWithParens ~ ANDT() ~ operatorWithParens) ^^ {
      case left ~ _ ~ right => And(left, right)}
}

Parsing (A == B) AND (C == D) succeeds.

I can not wrap my head around why the former doesn't work while the later does. How should I change my code to be able to parse A == B AND C == D?

EDIT: Following @Andrey Tyukin advice I've modified the gramma to account for precedence

def program: Parser[Operator] = positioned {
    phrase(expr)
}
def expr: PackratParser[Operator] = positioned {
    (expr ~ ORT() ~ expr1) ^^ {
      case left ~ _ ~ right => Or(left, right)} | expr1
}
def expr1: PackratParser[Operator] = positioned {
    (expr1 ~ ANDT() ~ expr2) ^^ {
      case left ~ _ ~ right => And(left, right)} | expr2
}
def expr2: PackratParser[Operator] = positioned {
    (NOTT() ~ expr2) ^^ {case _ ~ opr => Not(opr)} | expr3
}
def expr3: PackratParser[Operator] = {
    lparen ~> (expr) <~ rparen | leaf
}

And although PackratParser supports left-recursive grammar, I run into an infinite loop that never leaves expr

Alexandru Barbarosie
  • 2,952
  • 3
  • 24
  • 46
  • What is `phrase`? What is `or`? It's somehow incomplete. Would it maybe be possible to provide the complete parser, with all the imports, ideally as an ammonite script with all dependencies? – Andrey Tyukin Aug 17 '22 at 22:00
  • Does it by any chance generate `Equals(And(Equals(A, B), C),D)`? In other words it is parsed as `((A == B) AND C) == D`? Because without operator precedence that is what you would expect. We parse `A*B/C*D` differently from `A*B + C*D` because `+` has lower precedence than `*` but `/` has the same precedence as `*`. Operator precedence has to be expressed in the grammar. – Tim Aug 17 '22 at 22:19
  • When asking questions about parser combinators, you should specify which library is being used. Based on the presence of `^^` I would guess scala-parser-combinators? That is very slow and buggy and there are much better alternatives available (e. g. cats-parse). – Matthias Berndt Aug 17 '22 at 23:24
  • @MatthiasBerndt Yes it is using scala-parser-combinators. The clue is the `packrat-parsing` tag and the word `PackratParser` in the title and in the question itself. – Tim Aug 18 '22 at 06:37

1 Answers1

1

It looks like there is a path from operator to a shorter operator:

operator -> node -> and -> (operator ~ somethingElse)

You seem to be assuming that the shorter operator (left) will somehow reduce to leaf, whereas the outermost operator would skip the leaf and pick the node, for whatever reason. What it does instead is just chocking on the first leaf it encounters.

You could try to move the node before the leaf, so that the whole operator doesn't choke on the first A when seeing sth. like A == B AND ....

Otherwise, I'd suggest to refactor it into

  • disjunctions
  • of conjunctions
  • of atomic formulas

where atomic formulas are either

  • comparisons or
  • indivisible parenthesized top-level elements (i.e. parenthesized disjunctions, in this case).

Expect to use quite a few repSeps.

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
  • I think moving `node` before `leaf` just makes it right associative rather than left associative so it still wouldn't parse correctly. It needs to implement operator precedence in order to parse the way that is wanted. – Tim Aug 17 '22 at 22:27
  • 1
    `just makes it right associative rather than left associative so it still wouldn't parse correctly` - I think you're right. It wouldn't parse correctly. But I somehow have the vague suspicion that it might actually parse as *something* instead of running into first leaf and giving up. Maybe it would even accept "the right language", even though it would generate a gibberish data structure out of it. – Andrey Tyukin Aug 17 '22 at 22:40
  • 1
    Either way the solution is definitely to model operator precedence in the grammar, but I don't know Packrat well enough to show how :) – Tim Aug 18 '22 at 06:31
  • > What it does instead is just chocking on the first leaf it encounters. I was under the impression it will try all possible options till it finds one that matches the input string. Hence the idea of choking on the 1st wrong one seems weird as long as a valid path exists. – Alexandru Barbarosie Aug 18 '22 at 09:31
  • @AlexandruBarbarosie *"I was under the impression it will try all possible options till it finds one that matches the input string"* - you're correct, and that's exactly the problem: it find one that matches, it "eats" a part of the input, but then it finds that it can no longer "eat" whatever is coming after that, so it stops. Unlike your average LR-parser generated by yacc, it will not attempt to regurgitate the already consumed sub-`operator` and to ruminate and chew and eat it again. It's not a ruminant animal, it doesn't have this kind of backtracking behavior. – Andrey Tyukin Aug 18 '22 at 10:10
  • @AndreyTyukin does this mean that there must be only one unambiguous way to parse the input in order for this to work? – Alexandru Barbarosie Aug 18 '22 at 12:58
  • @AlexandruBarbarosie That's just a generally desirable property of a grammar: you really don't *ever* want to accept any inputs that could be parsed into more than one abstract syntax tree. Whether this property is automatically guaranteed by the flavor of parser you're currently using: not sure, would have to investigate and refresh some theory. – Andrey Tyukin Aug 18 '22 at 13:14