One-Character parenthesis matching

Question

Given the grammar rule (BNF, | means or):

x := a | x x | x + x | x + "x" | "x" + x | "x" + "x"

, with

+ left-associative (a+a+a means (a+a)+a),
concatenation left-associative (aaa means (aa)a, not a(aa)),
and + lazily eating operands (aa+aa means a(a+a)a).

Problem: Is this grammar ambiguous? I.e. is it possible to parse a string in two different ways?

Examples:

Allowed: a, a+a, a+"a", "a+a"+"a+a" (read as (a+a)+(a+a)), ""a"+"a""+"a" (read as ((a)+(a))+(a)), a+a+a, a+"a"+a.

Forbidden: "a+a", +"a", a++a, "a", a+"a, ""a+a"+a".

Application: I hate to escape { and } in LaTeX, so I wanted to make a LaTeX dialect in which only one character needs to be escaped, thus replace both { and } by one character " for example, and write something like ""1+2"/3"^"a+b" instead of {\frac{1+2}{3}}^{a+b}.

`"frac"1+2"3"^"a+b"` - ouch! The question of ambiguity aside, how would you hope to be able to visually parse that? — NPE, Dec 26 '14 at 09:25
Yes, that was my other problem :), how to design an algorithm that determines what `"` are opening brackets and what are closing. But I thought there were compiler compilers for that.. — Carucel, Dec 26 '14 at 09:28
I am not thinking about the computer, I am thinking about the human reader. http://www.goodreads.com/quotes/9168-programs-must-be-written-for-people-to-read-and-only — NPE, Dec 26 '14 at 09:29
Nice quote!, but perhaps when the editor automatically uses a darker background color grey for more deeply nested groups `"..."` it might be possible to interpret this code on the fly as a human being. — Carucel, Dec 26 '14 at 09:35
Usually I deal with the problem of escaping the curly brackets by writing `\newcommand{\set}[1]{\{#1\}}`. — David Eisenstat, Dec 26 '14 at 15:07
What is your derivation of "a+a"+"a+a"? I don't believe that it can be generated by that grammar. As you say, `x` does not derive `"a+a"`, so that expression cannot be derived from any of `x + x`, `"x" + x` or `x + "x"`, all of which would require that at least one of the arguments to the innermost `+` be an `x`. — rici, Dec 26 '14 at 21:23
@rici, you are completely right, I meant to add |"x"+"x", but forgot, I updated the post. How is x:=x x | a ambiguous? — Carucel, Dec 27 '14 at 10:10
@user815305: `x := x x | a` has two right-most derivations for `a a a`: `x->x x->x x x->x x a->x a a->a a a` and `x->x x->x a->x x a->x a a->a a a`. The first one corresponds to `{a{aa}}` and the second to `{{aa}a}` — rici, Dec 28 '14 at 01:54
@rici Yes you're right. The concatenation operator is supposed to be left associative. — Carucel, Dec 28 '14 at 08:15

rns · Accepted Answer · 2014-12-28T09:16:51.063

2

Here is a a quick and dirty script using Marpa::R2, a Perl interface to Marpa, a general BNF parser to parse the inputs with the grammar you've provided and its modified version, which supports lazy eating and left assoc, but doesn't forbid "a": code, output.

The grammar is not ambiguous for the inputs you've provided as parse() would throw an exception otherwise.

Hope this helps.

P.S. Using Marpa's general BNF parsing capability to provide a frontend with better syntax for TeX (among others) was discussed in the Marpa community.

update: re asker's comment

This grammar (in Marpa SLIF DSL, || means lower precedence)

x ::= a
   ||    x     '+'     x
   |     x     '+' '"' x '"'
   | '"' x '"' '+'     x
   | '"' x '"' '+' '"' x '"'
   ||    x             x

unambigously parses the inputs in the question except "a+a"+"a+a", for which "x" alternative can be needed (which will make the grammar ambiguous, as rici helpfully suggests in the comment below, more on that in the next para): code, output.

Overall, with double quotes " serving as parens, '+' as, well, plus, it is tempting to add a sign for an op with lower precedence than '+', e.g. '.' for concatenation and make it a classic term/factor grammar, which can be expressed as follows in Marpa SLIF DSL:

x ::= a
  || '"' x '"' assoc => group
  || x '+' x
  || x '.' x

Update 1:

# input: "a+a"+"a+a"
Setting trace_terminals option
Lexer "L0" accepted lexeme L1c1 e1: '"'; value="""
Lexer "L0" accepted lexeme L1c1 e1: '"'; value="""
Lexer "L0" accepted lexeme L1c2 e2: a; value="a"
Lexer "L0" accepted lexeme L1c3 e3: '+'; value="+"
Lexer "L0" accepted lexeme L1c3 e3: '+'; value="+"
Lexer "L0" accepted lexeme L1c4 e4: a; value="a"
Lexer "L0" accepted lexeme L1c5 e5: '"'; value="""
Lexer "L0" accepted lexeme L1c5 e5: '"'; value="""
Lexer "L0" accepted lexeme L1c6 e6: '+'; value="+"
Lexer "L0" accepted lexeme L1c6 e6: '+'; value="+"
Lexer "L0" accepted lexeme L1c7 e7: '"'; value="""
Lexer "L0" accepted lexeme L1c8 e8: a; value="a"
Error in SLIF parse: No lexeme found at line 1, column 9
* String before error: "a+a"+"a
* The error was at line 1, column 9, and at character 0x002b '+', ...
* here: +a"
Marpa::R2 exception at C:\cygwin\home\Ruslan\Marpa-R2-work\q27655176.t line 63.

Progress report is:
F3 @7-8 L1c7-8 x -> a .
R7:6 @0-8 L1c1-8 x -> '"' x '"' '+' '"' x . '"'
# ast dump:
undef

edited Dec 28 '14 at 09:16

answered Dec 26 '14 at 13:46

rns

771
4
9

If you allow `x->"x"` then you have ambiguity, since `"a"a"a"` can be parsed as either `{a{a}a}` or `{a}a{a}`. (Replace the `a`s with more complicated expressions involving `+` to see more interesting ambiguities.) – rici Dec 26 '14 at 21:29
I didn't get python to work to test your code, but does it work / is it ambiguous if the modification to the grammar is made as I did in the question? ANTRL complains that the grammar x : 'a' | x x | x '+' x | x '+' '"' x '"' | '"' x '"' '+' x | '"' x '"' '+' '"' x '"'; is left recursive and the Gold Parser simply rejects "a+"a""+a. – Carucel Dec 27 '14 at 10:36
The update is posted, hope it answers your question. Unlike others, Marpa parses literally everything you can express in BNF, incl. left, right and middle recursions. There is an effort to [port Marpa (the c lib) to python](https://github.com/koo5/new_shit/tree/master/marpa_cffi). – rns Dec 27 '14 at 15:05
Sorry for my foolishness, but how is `"a+a"+"a+a"` ambiguous? While your code gives an error, I can only think of one way to parse this: (['x','"',['x',['x',['a','a']],'+',['x',['a','a']]],'"','+','"',['x',['x',['a','a']],'+',['x',['a','a']]],'"']), or in a more natural notation (a+a)+(a+a). – Carucel Dec 27 '14 at 15:36
No prob, that's just me not being, well, unambiguous enough in my writing: I was trying to say that `"a+a"+"a+a"` can't be parsed at all with the grammar and that all other inputs are parsed ok and unambigously. Of course `"a+a"+"a+a"` isn't ambiguous. – rns Dec 27 '14 at 16:30
But why can `"a+a"+"a+a"` not be parsed? The following seems like a perfect parsing of `"a+a"+"a+a"`: first choose the fifth option in the grammar, for each `x` then choose the second option in the grammar, and for each last `x` choose the first option. – Carucel Dec 28 '14 at 08:21
I found an ambiguous sentence: `"a+"a"a"+a"a"+a` can be interpreted as `(a+(a)a)+a(a)+a` or `(a+(a(a)+a)a)+a` – Carucel Dec 28 '14 at 10:47
1

re "a+a"+"a+a" — it can be parsed if we allow equal precedence for `x ::= a` and `'+'` rules — https://gist.github.com/rns/57bacf93e3ea0ade095f – rns Dec 28 '14 at 11:45
re "a+"a"a"+a"a"+a -- can't be parsed with the grammar in the above gist. – rns Dec 28 '14 at 11:51
@rns: Does it parse `a+"a"a`? (`x->xx->xa->x+"x"a->x+"a"a->a+"a"a`). That and a similar derivation of `a"a"+a` seem valid to me, in which case one parse of the above expression is `"x+a"+x` (with the `x`'s replaced with those two expressions, respectively). – rici Dec 29 '14 at 14:12
@rici: yes [it](https://gist.github.com/rns/57bacf93e3ea0ade095f#file-q27655176-pl) does, the asts: `['x',['x',['a']],['x','"',['x',['a']],'"','+',['x',['a']]]]` and `['x',['x',['x',['a']],'+','"',['x',['a']],'"'],['x',['a']]]` – rns Dec 30 '14 at 05:43
@rns: Then why can't `"a+"a"a"+a"a"+a` be parsed? Is it because Marpa refuses to parse ambiguous expressions? (I believe that usr815305 is correct about it being ambiguous) – rici Dec 30 '14 at 20:06
@rici: it can, if precedence is removed `s/|| x x/|| x x/`, as shown by [this gist](https://gist.github.com/rns/64b586b194362313b34c). And yes, user815305 is correct, it is ambiguous and Marpa does handle ambiguity by design (note `Ambiguous parse: 3 alternatives:` and `ast [0-3]` in [the output](https://gist.github.com/rns/64b586b194362313b34c#file-output), because the algorithm is designed to parse anything you can write in BNF, and literally so, and Marpa does the job unlike others, as we've seen in user815305's comment about ANTLR anbd GoldParser. – rns Dec 31 '14 at 07:21
@rici: precedence was added to meet left associativity and "lazy eating" requirements in the question and ambiguity doesn't play nice with precedence and associativity at the grammar level. However, precedence and associativity criteria can be applied directly to the AST's or via a parse-forest grammar per Grune & Jacobs 2nd ed., sec. 3.7.4, p. 91, e.g. [here](https://github.com/rns/MarpaX-ASF-PFG/blob/master/t/02_expr_number.t). – rns Dec 31 '14 at 07:30
@rici: Happy 2015, btw. :) – rns Dec 31 '14 at 07:31
@rns And Happy 2014 to you, too! (9 hours early by my wallclock, but it's all relative.) wrt the question, that seems like an odd effect of "precedence". It's more like the static precedence in a bison lalr parser than the dynamic precedence which you can use with a glr parser (the glr parser is roughly the same as marpa, except that bison insists on a single parse so if you have an ambiguous grammar, you need to figure out how to merge ambiguities.) IMHO, in a parser which claims to handle all CFGs including ambiguous ones, precedence should only be applied between otherwise plausible parses. – rici Dec 31 '14 at 20:03

One-Character parenthesis matching

1 Answers1