4

I've been staring myself blind on this problem I have and I guess this will probably be a real stupid question. But I have to swallow my pride.

I have this combinator parser that doesn't backtrack like I thought it would. I've been reducing it down to a small example without entirely removing context. Feels like "foobar"-examples are just harder to read. Here I go:

@RunWith(classOf[JUnitRunner])
class ParserBacktrackTest extends RegexParsers with Spec with ShouldMatchers {
  override def skipWhitespace = false

  lazy val optSpace = opt(whiteSpace)
  lazy val number = """\d+([\.]\d+)?""".r
  lazy val numWithOptSpace = number <~ optSpace

  private def litre = numWithOptSpace <~ ("litre" | "l")
  def volume = litre ^^ { case _ => "volume" }

  private def namedPieces = numWithOptSpace <~ ("pcs") ^^ { case _ => "explPcs" }
  private def implicitPieces = number ^^ { case _ => "implPcs" }
  protected def unitAmount = namedPieces | implicitPieces

  def nameOfIngredient = ".*".r

  def amount = volume | unitAmount
//  def amount = unitAmount
  protected def ingredient = (amount <~ whiteSpace) ~ nameOfIngredient

  describe("IngredientParser") {
    it("should parse volume") {
      shouldParse("1 litre lime")
    }
    it("should parse explicit pieces") {
      shouldParse("1 pcs lime")
    }
    it("should parse implicit pieces") {
      shouldParse("1 lime")
    }
  }

  def shouldParse(row: String) = {
    val result = parseAll(ingredient, row)
    result match {
      case Success(value, _) => println(value)
      case x => println(x)
    }
    result.successful should be(true)
  }
}

So what happens is that the third test fails:

(volume~lime)
(explPcs~lime)
[1.4] failure: string matching regex `\s+' expected but `i' found

1 lime
   ^

So it seems the litre-parser consumed the l and then it failed when it couldn't find any space. But I would have thought that it would backtrack then and try the next production rule. Obviously the implicitPieces parser parses this line because if I remove the preceding volume parser (remove the comment), it succeeds

(implPcs~litre lime)
(explPcs~lime)
(implPcs~lime)

Why isn't amount backtracking? What am I misunderstanding?

Viktor Hedefalk
  • 3,572
  • 3
  • 33
  • 48

2 Answers2

4

I just want to post a minimal example illustrating my misunderstanding. I thought this would succeed:

  def foo = "foo" | "fo"
  def obar = "obar"

  def foobar = foo ~ obar

  describe("foobar-parser") {
    it("should parse it") {
      shouldParseWith(foobar, "foobar")
    }
  }

but backtracking over | doesn't work that way. The disjunctive parser will consume "foo" and never give it back.

It has to be normalized so the disjunction is moved to the top level:

def workingFooBar = ("foo" ~ obar) | ("fo" ~ obar)
Viktor Hedefalk
  • 3,572
  • 3
  • 33
  • 48
  • Is there any parser library in scala where the transformation is done automatically? – HRJ Apr 03 '14 at 07:38
2

It doesn't backtrack because for 1 lime

  • ingredient starts out with amount
  • amount starts with volume
  • volume starts with litre, and
  • litre successfully consumes 1 l of 1 lime

So litre, volume and amount were all successful! This is why then the whole thing continues with the second part of ingredient, namely whiteSpace.

HTH!

fotNelton
  • 3,844
  • 2
  • 24
  • 35
  • I just don't get that? I mean it has to be able to backtrack when it fails. A non-backtracking CFG parser wouldn't be worth much. I mean this works: `{def foo = "foo"; def bar = "bar"; def baz = "baz"; def foobarbaz = foo ~ bar ~ baz; def foobarfoo = foo ~ bar ~ foo; def p = foobarbaz | foobarfoo; describe("foobar-parser") { it("should parse it") { shouldParseWith(p, "foobarfoo") } }}` – Viktor Hedefalk Feb 08 '12 at 13:43
  • 1
    @hedefalk In this example `foobarbaz` fails, so `|` tries the alternative. In your question, `volume` succeeds, so `|` doesn't try the alternative. – Daniel C. Sobral Feb 08 '12 at 16:05
  • Yes, that's what I was trying to show. @hedefalk: It may be that it's just a matter of priority so that you would only need to rework a minor part of your grammar. As Daniel said, in `a | b` the right-hand side will only be evaluated once the left-hand side fails which is not the case in your OP. See if you can rearrange one or more rules to get the priorities that you want to have. – fotNelton Feb 08 '12 at 16:19
  • I realize I didn't understand how these parsers work at all. Changing: `def amount = (volume | unitAmount) <~ whiteSpace` to `def amount = (volume <~ whiteSpace) | (unitAmount <~ whiteSpace)` makes the backtracking work. So it only backtracks over a disjunction (`|`) but sequencing with `~` is gready? – Viktor Hedefalk Feb 08 '12 at 16:23
  • But what are then `~!` for? I still don't get it. I'm probably real slow today :´( – Viktor Hedefalk Feb 08 '12 at 16:25
  • Ah, NOW I get it. "the right-hand side will only be evaluated once the left-hand side fails". So it's the disjunction that makes it greedy. If I have `(p | q)` and `p` succeeds, `q` will never be tried in a backtracking of a composition…? – Viktor Hedefalk Feb 08 '12 at 16:33
  • Yep, there's no need to try `q` in that case and that's why it didn't backtrack. – fotNelton Feb 08 '12 at 17:47