How to resolve this ambiguous grammar?

Question

I have written this grammar:

expr        : multExpr ( ('+' | '-') multExpr )*;
multExpr    : atom ( ('*' | '/') atom )*;
atom    : INT | FLOAT | ID | '(' expr ')';
condition   : cond ('or' cond)*;
cond    : c1 ('and' c1)*;
c1      : ('not')? c2;
c2      : '(' condition ')' | boolean;
boolean : expr (relop expr | ²) | 'true' | 'false';
relop   : '<' | '<=' | '>' | '>=' | '==' | '!=';

I have omitted the lexer rules for INT,FLOAT,ID as it is obvious.

The problem is c2 rule, it is ambiguous because of '(', I could not find the solution, can you offer me a solution?

What's the super-scripted `2` doing in `boolean`? – Bart Kiers Feb 15 '12 at 20:21 — Bart Kiers, Feb 15 '12 at 20:21

Bart Kiers · Accepted Answer · 2012-02-15T20:50:11.597

5

Why not simply do:

expr      : orExpr; 
orExpr    : andExpr ('or' andExpr)*;
andExpr   : relExpr ('and' relExpr)*;
relExpr   : addExpr (relop addExpr)?;
relop     : '<' | '<=' | '>' | '>=' | '==' | '!=';
addExpr   : multExpr (('+' | '-') multExpr)*;
multExpr  : unaryExpr (('*' | '/') unaryExpr)*;
unaryExpr : 'not'? atom;
atom      : INT | FLOAT | ID | 'true' | 'false' | '(' expr ')';

The unary not usually has a higher precedence than you're trying to do now.

This will allow for expressions like 42 > true, but checking such semantics can come when you're walking the AST/tree.

EDIT

The input "not(a+b >= 2 * foo/3.14159) == false" would now be parsed like this (ignoring spaces):

enter image description here

And if you set the output to AST and mix in some tree rewrite operators (^ and !):

options {
  output=AST;
}

// ...

expr      : orExpr; 
orExpr    : andExpr ('or'^ andExpr)*;
andExpr   : relExpr ('and'^ relExpr)*;
relExpr   : addExpr (relop^ addExpr)?;
relop     : '<' | '<=' | '>' | '>=' | '==' | '!=';
addExpr   : multExpr (('+' | '-')^ multExpr)*;
multExpr  : unaryExpr (('*' | '/')^ unaryExpr)*;
unaryExpr : 'not'^ atom | atom;
atom      : INT | FLOAT | ID | 'true' | 'false' | '('! expr ')'!;

you'd get:

enter image description here

edited Feb 15 '12 at 20:50

answered Feb 15 '12 at 20:27

Bart Kiers

166,582
36
299
288

this works for conditional expressions, but I'm using expr(in your grammar: addExpr) for math stuff, then I should define a separate expr I think for that. and another thing, you have defined unaryExpr but not used it. – nafiseh Feb 15 '12 at 20:36
,thanks your solution is correct i think, but i was thinking, is it wise to use Syntactic Predicates like this: c2 : ( '(' atom ('*'atom)* (('+'|'-')atom('*'atom)*)* ')') => boolean | '(' condition ')' – nafiseh Feb 15 '12 at 21:09
@nafiseh, my opinion: if you can, avoid predicates as much as possible. I know, sometimes you need them, but I prefer to construct the AST more loosely and then validate the semantic structure of the AST at a later stage: it keeps the grammar so much friendlier for the eyes! :) – Bart Kiers Feb 15 '12 at 21:13
ok, and last question! what a bout Backtracking? when it is enabled, does the grammar still need to be unambiguous? – nafiseh Feb 15 '12 at 21:17
@nafiseh, what do you mean? By using a predicate, the parser will follow (or "look") what you used in that predicate, and when it really "sees" it, it takes that path, and if it does not see what you've put in the predicate, it backtracks from it and takes an alternative path. But perhaps you meant global backtracking (`backtrack=true;` in the `options`)? I really wouldn't do that if you're just trying to fix a single ambiguity! That's like squashing a fly with a large iron frying pan :) – Bart Kiers Feb 15 '12 at 21:24
yes I meant global backtracking. Thank you very much Bart for your help! – nafiseh Feb 15 '12 at 21:30

Jerry Coffin · Answer 2 · 2012-02-15T21:03:14.140

You problem stems from the fact that the '(' could be the start of either the first alternative for c2 or the last alternative for atom. Just for example, given input like ((x+y) > (a+b)), the first open paren is the beginning of a c2, but the second is the beginning of an atom. [edit: And the parser has no indication of which way to go until some arbitrary point later -- for example, it can't know that the first open paren is the beginning of a c2 until it encounters the >. For example, if that were a * instead, then both the opening parens would be beginnings of atoms.]

One possible way to handle it would be to unify the rules for arithmetic and Boolean expressions, so you only have one rule with '(' expression '), and the expression might be arithmetic or Boolean. This often, however, has the side-effect of producing rather loose typing, with relatively free conversion between arithmetic and Boolean expressions (at least at the parser level -- you can then enforce the types as rigidly as you like in the semantics).

Edit: In Pascal, for example, the rules run something like this (simplifying a tiny bit):

expression: simple_expression ( rel_op simple_expression )*

simple_expression: ( '+' | '-')? term ( ('+' | '-' | 'or' ) term )*

term: factor ( ( '/' | '*' | 'div' | 'mod' | 'and') factor )*

factor: constant | variable | function_call | '(' expression ')' | 'not' factor

yes, I think I have to do this, and one question, how do some well-known languages like java, pascal ,.. resolve this? do they act like this in their grammar? or do they use different approaches like Backtracking and somethings like this? — nafiseh, Feb 15 '12 at 20:51
Depends on the language. Fortran uses loose rules and backtracking. Pascal uses about what I've outlined above -- one set of rules for all expressions, Boolean or otherwise. Most others do about like Pascal. See edited answer -- I've added the rules from Pascal. — Jerry Coffin, Feb 15 '12 at 21:04

score 0 · Answer 3 · answered Feb 15 '12 at 19:58

0

Couldn't you define c1 as the following?

('not')? (('(' condition ')') | boolean)

answered Feb 15 '12 at 19:58

Scott Hunter

48,888
12
60
101

Wouldn't this still be ambiguous with the `atom` rule? – Bill Feb 15 '12 at 20:00
no , the problem still exists, if the boolean goes to expr, expr goes to multExpr , then atom and then '(' expr ')'. – nafiseh Feb 15 '12 at 20:07

score 0 · Answer 4 · answered Feb 15 '12 at 19:58

0

One way to approach this problem is to split it into two sets of lexer rules and apply them sequentially to the input (one for the math stuff, the other for the boolean).

answered Feb 15 '12 at 19:58

Bill

25,119
8
94
125

so you mean I should not go to expr from boolean anymore? – nafiseh Feb 15 '12 at 20:10
Depends on what you are trying to achieve. You could separate them and create a booleanexpr instead of reusing expr. It would be helpful if you could list some sample valid inputs. Is `true and (4 + 3) /4` a valid expression for example? – Bill Feb 15 '12 at 20:26
yes, then it would be like what Bart suggested, and I need to define 2 types of expressions. actually I have a grammar for the whole of a language, I have math expressions too. – nafiseh Feb 15 '12 at 20:40

How to resolve this ambiguous grammar?

4 Answers4