2

I'm taking a first stab at creating a grammar for expressions like:

(foo = bar or (bar = "bar" and baz = 45.43)) and test = true

My grammar so far looks like:

grammar filter;

tokens {
    TRUE = 'true';
    FALSE = 'false';
    AND = 'and';
    OR = 'or';
    LT = '<';
    GT = '>';
    EQ = '=';
    NEQ = '!=';
    PATHSEP = '/';
    LBRACK = '[';
    RBRACK = ']';
    LPAREN = '(';
    RPAREN = ')';
}

expression : or_expression EOF;

or_expression : and_expression (OR or_expression)*;

and_expression : term (AND term)*;

term : atom ( operator atom)? | LPAREN expression RPAREN;

atom : ID | INT | FLOAT | STRING | TRUE | FALSE;

operator : LT | GT | EQ | NEQ;

INT : '0'..'9'+;
FLOAT : ('0'..'9')+ '.' ('0'..'9')*;
STRING : '"' ('a'..'z'|'A'..'Z'|'_'|' ')* '"';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;

But in ANTLRWorks 1.4.3, I get the parse tree:

The resulting parse tree with the above input

But for the life of me I can't figure out what is wrong with my grammar. What token is it missing here?

Many thanks in advance.

Edit: To clarify the atom ( operator atom)? alternative in the atom production, I should perhaps mention that atoms should be able to be free-standing without comparison to another atom. E.g. a or b is a valid expression.

estan
  • 1,444
  • 2
  • 18
  • 29

1 Answers1

5

I'm answering to my own question here. I found two problems with my grammar. The first was easy to spot; I had put EOF at the end of my top-level rule:

expression : or_expression EOF;

The EOF was thus the missing token. My solution was remove the EOF from the expression rule, and instead introduce a rule above it:

filter: expression EOF;

The second problem was that my or_expression rule should be:

or_expression : and_expression (OR and_expression)*;

and not

or_expression : and_expression (OR or_expression)*;

The full corrected grammar is:

grammar filter;

tokens {
    TRUE = 'true';
    FALSE = 'false';
    AND = 'and';
    OR = 'or';
    LT = '<';
    GT = '>';
    EQ = '=';
    NEQ = '!=';
    PATHSEP = '/';
    LBRACK = '[';
    RBRACK = ']';
    LPAREN = '(';
    RPAREN = ')';
}

filter: expression EOF;

expression : or_expression;

or_expression : and_expression (OR and_expression)*;

and_expression : term (AND term)*;

term : atom (operator atom)? | LPAREN expression RPAREN;

atom : ID | INT | FLOAT | STRING | TRUE | FALSE;

operator : LT | GT | EQ | NEQ;

INT : '0'..'9'+;
FLOAT : ('0'..'9')+ '.' ('0'..'9')*;
STRING : '"' ('a'..'z'|'A'..'Z'|'_'|' ')* '"';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;

And the resulting parse tree is:

The correct parse tree

estan
  • 1,444
  • 2
  • 18
  • 29