3

Consider

import util.parsing.combinator._
object TreeParser extends JavaTokenParsers {

    lazy val expr: Parser[String] = decimalNumber | sum
                                                  //> expr: => TreeParser.Parser[String]
    lazy val sum: Parser[String] = expr ~ "+" ~ expr ^^ {case a ~ plus ~ b => s"($a)+($b)"}
                                                  //> sum: => TreeParser.Parser[String]
    println(parseAll(expr, "1 + 1"))                       //> TreeParser.ParseResult[String] = [1.3] failure: string matching regex 
                                              //| `\z' expected but `+' found
}

The same story with fastparse

import fastparse.all._
val expr: P[Any] = P("1" | sum)
val sum: P[Any] = expr ~ "+" ~ expr
val top = expr ~ End
println(top.parse("1+1")) // Failure(End:1:2 ..."+1")

Parsers are great to discover that taking the first literal is a bad idea but do not try to fall back to the sum production. Why?

I understand that parser takes the first branch that can successfully eat up a part of input string and exits. Here, "1" of expression matches the first input char and parsing completes. In order to grab more, we need to make sum the first alternative. However, plain stupid

lazy val expr: Parser[String] = sum | "1"

endы up with stack overflow. The library authors therefore approach it from another side

val sum: P[Any] = P( num ~ ("+".! ~/ num).rep )
val top: P[Any]   = P( sum ~ End )

Here, we start sum with terminal, which is fine but this syntax is more verbose and, furthermore, it produces a terminal, followed by a list, which is good for a reduction operator, like sum, but is difficult to map to a series of binary operators.

What if your language defines expression, which admits a binary operator? You want to match every occurrence of expr op expr and map it to a corresponding tree node

expr ~ "op" ~ expr ^^ {case a ~ _ ~ b => BinOp(a,b)"} 

How do you do that? In short, I want a greedy parser, that consumes the whole string. This is what I mean by 'greedy' rather than greedy algorigthm that jumps into the first wagon and ends up in a dead end.

Community
  • 1
  • 1
  • I think that [artima guide](http://artima.com/pins1ed/combinator-parsing.html#31.10) explains it. It explains why should we prefer `expr ::= term {"+" term}` over backtracking `expr ::= term + expr ∣ term` and falling into infinite recursion `expr ::= expr + term ∣ term`. – Valentin Tihomirov Dec 24 '15 at 14:57
  • I read that undesire to backtrack is the feature of all PEG parsers: [Whereas regular expression matchers may start by matching greedily, but will then backtrack and try shorter matches if they fail and CFG tries every possibility, PEG's `*`, `+`, and `?` operators always behave greedily, consuming as much input as possible and never backtracking: Expression `a*` will always consume as many a's as are consecutively available in the input string, causing `(a* a)` to fail persistently.](https://en.wikipedia.org/wiki/Parsing_expression_grammar#Operational_interpretation_of_parsing_expressions) – Little Alien Nov 02 '16 at 17:55
  • Probably one of the reasons is that once they have matched the prefix, they call a `semantic action` which cannot be 'backtracked'. – Little Alien Nov 02 '16 at 18:51

2 Answers2

2

As I have found here, we need to replace | alternative operator with secret |||

//lazy val expr: Parser[String] = decimalNumber | sum
lazy val backtrackGreedy: Parser[String] =  decimalNumber ||| sum

lazy val sum: Parser[String] = decimalNumber ~ "+" ~ backtrackGreedy ^^ {case a ~ plus ~ b => s"($a)+($b)"}

println(parseAll(backtrackGreedy, "1 + 1")) // [1.6] parsed: (1)+(1)

The order of alternatives does not matter with this operator. To stop stack overflow, we need to eliminate the left recursion, sum = expr + expr => sum = number + expr.

Another answer says that we need to normalize, that is instead of

  def foo = "foo" | "fo"
  def obar = "obar"

  def foobar = foo ~ obar

we need to use

def workingFooBar = ("foo" ~ obar) | ("fo" ~ obar)

But first solution is more striking.

Community
  • 1
  • 1
1

The parser does backtrack. Try val expr: P[String] = P(("1" | "1" ~ "+" ~ "1").!) and expr.parse("1+1") for example.

The problem is in your grammar. expr parses 1 and it is a successful parsing by your definition. Then sum fails and now you want to blame the dutiful expr for what happened?

There are plenty of examples on how to deal with binary operators. For example, the first example here: http://lihaoyi.github.io/fastparse/

lastland
  • 890
  • 3
  • 14
  • 28
  • I have seen those examples. You do not notice that the question actually is why parser is not greedy. The input string corresponds to the grammar. It can be parsed as 1 ~ sum ~ 1 expression. However, it does not. I just want to know why should I define grammar production rules as `expr = expr ~ rep(op ~ expr)` instead of simply `expr op expr`. I guess that the nature of the parser is the reason. It depends on the parser which context-free production rules it can process. I want expert in the field to clarify. – Valentin Tihomirov Nov 25 '15 at 15:07
  • It can get greedy. However, getting greedy does not solve your problem as you have already seen by switching positions of `1` and `sum`. – lastland Nov 25 '15 at 15:17
  • However, if your question is that if a parser can parse the grammar you have defined here, I think this is perhaps what you are looking for: http://richard.myweb.cs.uwindsor.ca/PUBLICATIONS/PADL_08.pdf – lastland Nov 25 '15 at 15:40
  • You express yourself mysteriously. Why should one interpret compiler failed at switching positions of `1` and `sum` as a greedy parser? – Valentin Tihomirov Nov 25 '15 at 15:41
  • 1
    Let's assume the parser is greedy. What will happen? `expr` chooses `sum` over `1` => `sum` parses `expr` => `expr` is greedy, so it again chooses `sum` over `1`... This will only result in the same endless loop as in `P(sum | "1")`, which is usually referred as left recursive grammar. My point is what you want requires more than simply getting greedy of a parser. There are techniques to get what you want, but with whole different algorithm and higher time/space complexity -- for details check the link I provided earlier. – lastland Nov 25 '15 at 16:17