Disallowing unnecessary outermost brackets in a BNFC-grammar

Question

This is a continuation to this question I asked earlier about a BNFC-grammar for propositional logic. I got it working with parentheses, as per the definition, but I would now like to extend the grammar to work without parentheses, with a catch however: no unnecessary outer parentheses allowed.

For example, the atomic sentence a should be allowed, but (a) should not be recognized. The sentence (a => b) & c should also be allowed, but ((a => b) & c) not, and so forth. The last example highlights the necessity for paretheses. The precedence levels are

equivalence <=> and implication =>,
conjuction & and disjunction |
negation - and
atoms.

The higher the level, the earlier it will be parsed.

I got the grammar working with the unnecessary parentheses, by setting precedence levels to the different operators via recursion:

IFF     .   L     ::=   L   "<=>" L1  ;
IF      .   L     ::=   L   "=>"  L1  ;
AND     .   L1    ::=   L1  "&"   L2  ;
OR      .   L1    ::=   L1  "|"   L2  ;
NOT     .   L2    ::=       "-"   L3  ;
NOT2    .   L2    ::=       "-"   L2  ;
P       .   L3    ::=   Ident         ;

_       .   L     ::=   L1            ;
_       .   L1    ::=   L2            ;
_       .   L2    ::=   L3            ;
_       .   L3    ::=   "(" L ")"     ;

Now the question is, how do I not allow the outer parentheses, the allowance of which is caused by the last rule L3 ::= "(" L ")";? It is strictly necessary for allowing parentheses inside an expression, but it also allows them on the edges. I guess I need some extra rule for removing ambiguity, but what might that be like?

This grammar also results in about 6 reduce/reduce conflicts, but aren't those pretty much inevitable in recursive definitions?

You could try duplicating some of your productions, forbidding a production goes directly from `L` to `L3` by having a different `L'` in the productions `L ::= L' "<=>" L1` and `L ::= L' "=>" L1`. In this case, the `L'` duplicates of your productions can be allowed to convert down to `L3`, but `L` itself cannot. So, the `'` part indicates that the production has participated in an actual expansion, not just a conversion for the sake of precedence. `'` duplicates can have parentheses around them, but non-`'` ones cannot. — Welbog, Apr 15 '20 at 14:30
@Welbog So basically I would have these 4 productions for the first level: `L ::= L' "<=>" L1`, `L ::= L' "=>" L1`, `L' ::= L' "<=>" L1` and `L' ::= L' "=>" L1`. Then with `L3` I would do a parenthesization of the `L'` versions like so: `_. L3 ::= "(" L' ")";`. — sesodesa, Apr 15 '20 at 14:45

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

You can do this by simply banning the parenthesised form from the toplevel. This requires writing the precedence hierarchy in a different fashion, in order to propagate the restriction through the hierarchy. In the following, the r suffix indicates that the production is "restricted" to not be a parenthesised form.

I also fixed the reduce/reduce conflicts by eliminating one of the NOT productions. See below.

(I hope I got the BNFC right. I wrote this in bison and tried to convert the syntax afterwards.)

_       .   S     ::=   L0r             ;

IFF     .   L0r   ::=   L0 "<=>" L1     ;
IF      .   L0r   ::=   L0 "=>"  L1     ;

AND     .   L1r   ::=   L1 "&"   L2     ;
OR      .   L1r   ::=   L1 "|"   L2     ;

NOT     .   L2r   ::=       "-"   L2    ;
ID      .   L2r   ::=   Ident           ;                                            

PAREN   .   L3    ::=   "(" L0 ")"      ;

_       .   L0r   ::=   L1r             ;
_       .   L1r   ::=   L2r             ;

_       .   L0    ::=   L0r             ;
_       .   L1    ::=   L1r             ;
_       .   L2    ::=   L2r             ;

_       .   L0    ::=   L3              ;
_       .   L1    ::=   L3              ;
_       .   L2    ::=   L3              ;

(Edit: I changed the IFF, IF, AND and OR rules by removing the restriction (r) from the first arguments. That allows the rules to match expressions which start with a parenthesis without matching the PAREN syntax.)

If you also wanted to disallow redundant internal parentheses (like ((a & b))), you could change the PAREN rule to

PAREN   .   L3    ::=   "(" L0r ")"     ;

which would make the L0 rule unnecessary.

A variant approach which uses fewer unit productions can be found in the answer by @IraBaxter to Grammar for expressions which disallows outer parentheses.

Side note:

This grammar also results in about 6 reduce/reduce conflicts, but aren't those pretty much inevitable in recursive definitions?

No, recursive grammars can and should be unambiguous. Reduce/reduce conflicts are not inevitable, and almost always indicate problems in the grammar. In this case, they are the result of the redundant productions for the unary NOT operator. Having two different non-terminals which can both accept "-" L3 is obviously going to lead to an ambiguity, and ambiguities always produce parsing conflicts.

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 15 '20 at 15:03

rici

234,347
28
237
341

This gives me an idea of how this should be done, but you have ignored a few rules imposed by BNFC, so this won't compile off the bat. The `r` suffixes should be converted to numbers and the symbol `|` is not recognized like it is in many other variants of BNF. The last rules should all be on their own rows. – sesodesa Apr 15 '20 at 15:18
@sesodesa: Sorry, I don't use BNFC. Do you have some kind of reference document? (Alternatively, I could include the bison version, which does compile and run :-) ) – rici Apr 15 '20 at 15:20
@Sesodesa: OK, I fixed the use of `|`. Assuming [this](https://bnfc.readthedocs.io/en/latest/lbnf.html#lexer) is the BNFC you are talking about, I see nothing which would make the use of `L1r` problematic. – rici Apr 15 '20 at 15:29
Well, the unedited answer simply didn't compile for me, as`bnfc` complained about those specific rules. The issue is that apparently rules being *coerced* need to have the same base identifier (a prefix string of letters) and the different coercion levels are separeted by postfix numbers. This is explained [here](https://bnfc.readthedocs.io/en/latest/lbnf.html#precedence-levels). – sesodesa Apr 15 '20 at 15:41
@SeSodesa: ok, i'll read that and try to fix it. But the idea is clear, no? – rici Apr 15 '20 at 16:04
I still need to think about this more (draw up a few derivations based on these productions), but I think I get the gist of it. With this construction, we can't derive a string that starts with a `(`. By the way, I think the start symbol `S` also requires a label, for example `Start . S ::= L;`. Looks like BNFC also complains if there isn't at least one rule with no digits at the end, the so called base case of the `L`-type productions. – sesodesa Apr 15 '20 at 16:49
And the reason the sentences can't start with a `(` is because you've placed the variables `L1` and `L2` at the ends of the productions that utilize them. – sesodesa Apr 15 '20 at 17:01
1

@SeSodesa: Yeah, you're right, and that means I did it wrong. I was solving a different problem, which is how to ban expressions which start with a parenthesis. (That comes up pretty often, because in languages in which expressions are statements and statements don't need to be separated with semicolons, you have to avoid statement expressions which start with a parenthesis in order to avoid an ambiguity with function calls). But your case is different. I played around with BNFC a bit, and I have something which it doesn't complain about, so I'll edit the answer with that. – rici Apr 16 '20 at 16:28

Disallowing unnecessary outermost brackets in a BNFC-grammar

1 Answers1

Side note: