1

[Also posted on the Instaparse mailing list, but posted here as well since I'm guessing this is a fairly general problem]

Consider the grammar

 D = (B|S)*
 S = 'S' B*
 B = 'B'

(This is Instaparse's version of BNF...)

B can occur by itself, or after S; if the latter, it should be considered part of the, er, S expression (no pun intended).

Example:

(-> "D = (B|S)*
     S = 'S' B*
     B = 'B'"
    parser
    (parses "BSBB"))

;;=>
([:D [:B "B"] [:S "S"] [:B "B"] [:B "B"]]
 [:D [:B "B"] [:S "S" [:B "B"] [:B "B"]]]    ;; <------
 [:D [:B "B"] [:S "S" [:B "B"]] [:B "B"]])

I'd like only the second result to match -- so that B gets included inside S when possible, and to remove the other options. What needs to be done to my parser to make this change?

More example expressions shown in this gist.

JohnJ
  • 4,753
  • 2
  • 28
  • 40
  • Aren't you just trying to recognize `'B'* ('S' 'B'*)*`? That's completely unambiguous, and easy to write in BNF. – rici Mar 21 '15 at 23:57
  • @rici - indeed, that also works -- thanks! My question is actually a simplified version of a more complex grammar, for which the negative lookahead is indeed needed AFAICT. – JohnJ Mar 22 '15 at 14:46

1 Answers1

2

You can use negative lookahead to postulate that matches of S must not be followed by valid Bs:

(-> "

D = (B|S)*
S = 'S' B* !B
B = 'B'

"
insta/parser
(insta/parses "BSBB"))
;= ([:D [:B "B"] [:S "S" [:B "B"] [:B "B"]]])

This works for all the examples in (the current version of) your gist as well.

Michał Marczyk
  • 83,634
  • 13
  • 201
  • 212