Resolving ambiguity in simple Instaparse grammar

Question

[Also posted on the Instaparse mailing list, but posted here as well since I'm guessing this is a fairly general problem]

Consider the grammar

 D = (B|S)*
 S = 'S' B*
 B = 'B'

(This is Instaparse's version of BNF...)

B can occur by itself, or after S; if the latter, it should be considered part of the, er, S expression (no pun intended).

Example:

(-> "D = (B|S)*
     S = 'S' B*
     B = 'B'"
    parser
    (parses "BSBB"))

;;=>
([:D [:B "B"] [:S "S"] [:B "B"] [:B "B"]]
 [:D [:B "B"] [:S "S" [:B "B"] [:B "B"]]]    ;; <------
 [:D [:B "B"] [:S "S" [:B "B"]] [:B "B"]])

I'd like only the second result to match -- so that B gets included inside S when possible, and to remove the other options. What needs to be done to my parser to make this change?

More example expressions shown in this gist.

Aren't you just trying to recognize `'B'* ('S' 'B'*)*`? That's completely unambiguous, and easy to write in BNF. — rici, Mar 21 '15 at 23:57
@rici - indeed, that also works -- thanks! My question is actually a simplified version of a more complex grammar, for which the negative lookahead is indeed needed AFAICT. — JohnJ, Mar 22 '15 at 14:46

score 2 · Accepted Answer · answered Mar 21 '15 at 18:26

You can use negative lookahead to postulate that matches of S must not be followed by valid Bs:

(-> "

D = (B|S)*
S = 'S' B* !B
B = 'B'

"
insta/parser
(insta/parses "BSBB"))
;= ([:D [:B "B"] [:S "S" [:B "B"] [:B "B"]]])

This works for all the examples in (the current version of) your gist as well.

Resolving ambiguity in simple Instaparse grammar

1 Answers1