2

So I defined a grammar to parse an C style syntax language:

grammar mygrammar;

program
: (declaration)*
  (statement)*
  EOF
;

declaration
: INT ID '=' expression ';'
;

assignment
: ID '=' expression ';'
;

expression
: expression (op=('*'|'/') expression)*
| expression (op=('+'|'-') expression)*
| relation
| INT
| ID
| '(' expression ')'
;

relation
: expression (op=('<'|'>') expression)*
;

statement
: expression ';'
| ifstatement
| loopstatement
| printstatement
| assignment
;

ifstatement
: IF '(' expression ')' (statement)* FI ';'
;

loopstatement
: LOOP '(' expression ')' (statement)* POOL ';'
;

printstatement
: PRINT '(' expression ')' ';'
;

IF : 'if';
FI : 'fi';
LOOP : 'loop';
POOL : 'pool';
INT : 'int';
PRINT : 'print';
ID : [a-zA-Z][a-zA-Z0-9]*;
INTEGER : [0-9]+;
WS : [ \r\n\t] -> skip;

And I can parse a simple test as this:

int i = (2+3)*3/2*(3+36);
int j = i;
int k = 2*1+i*3;
if (k > 2)
  k = k + 1;
  i = i / 3;
  j = j / 3;
fi;
loop (i < 10)
  i = i + 1 * (i+k);
  j = (j + 1) * (j-k);
  k = i + j;
  print(k);
pool;

However, when I want to generate ANTLR Recogonizers in intelliJ, I got this error:

sCalc.g4:19:0: left recursive rule expression contains a left recursive alternative which can be followed by the empty string

I wonder if this is caused by my ID could be an empty string?

Bastien Jansen
  • 8,756
  • 2
  • 35
  • 53
RandomEli
  • 1,527
  • 5
  • 30
  • 53

2 Answers2

2

It's about your expression and relation rules. The expression rule can match relation in one alt, which in turn recurses back to expression. Rule relation additionally can potentially match nothing because of (op=('<'|'>') expression)*

A better approach is probably to have relation call expression and remove the relation alt from expression. Then use relation everywhere you used expression now. That's a typical scenario in expressions, starting out with low precedence operations as top level rules and drilling down to higher precedence rules, ultimately ending at a simple expression rule (or similar).

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
2

There are a couple of issues with your grammar:

  • you have INT as an alternative inside expression while you probably want INTEGER instead
  • there is no need to do expression (op=('+'|'-') expression)*: this will do: expression op=('+'|'-') expression
  • ANTLR4 does not support indirect left recursive rules: you must include relation inside expression

Something like this ought to do it:

grammar mygrammar;

program
: (declaration)*
  (statement)*
  EOF
;

declaration
: INT ID '=' expression ';'
;

assignment
: ID '=' expression ';'
;

expression
: expression op=('*'|'/') expression
| expression op=('+'|'-') expression
| expression op=('<'|'>') expression
| INTEGER
| ID
| '(' expression ')'
;

statement
: expression ';'
| ifstatement
| loopstatement
| printstatement
| assignment
;

ifstatement
: IF '(' expression ')' (statement)* FI ';'
;

loopstatement
: LOOP '(' expression ')' (statement)* POOL ';'
;

printstatement
: PRINT '(' expression ')' ';'
;

IF : 'if';
FI : 'fi';
LOOP : 'loop';
POOL : 'pool';
INT : 'int';
PRINT : 'print';
ID : [a-zA-Z][a-zA-Z0-9]*;
INTEGER : [0-9]+;
WS : [ \r\n\t] -> skip;

Also not that this (statement)* can simply be written as statement*

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanks, that works. But I also want to know how can antlr4 simply match 0 to many statement without catch a group of them using `'()'` – RandomEli Sep 27 '16 at 16:13
  • Err, I don't understand. `(a)*` is simply the same as `a*`. If you have more than 1 thing to repeat, then you nee to put parenthesis around them: `(a b)*` (not the same as `a b*`) – Bart Kiers Sep 27 '16 at 18:52