3

I've a pretty basic math expression grammar for ANTLR here and what's of interest is handling the implied * operator between parentheses e.g. (2-3)(4+5)(6*7) should actually be (2-3)*(4+5)*(6*7).

Given the input (2-3)(4+5)(6*7) I'm trying to add the missing * operator to the AST tree while parsing, in the following grammar I think I've managed to achieve that but I'm wondering if this is the correct, most elegant way?

grammar G; 

options {
    language = Java;
    output=AST;
ASTLabelType=CommonTree;
}

tokens {
  ADD = '+' ;
  SUB = '-' ;
  MUL = '*' ;
  DIV = '/' ;
  OPARN = '(' ;
  CPARN = ')' ;
}

start
    : expression EOF!
    ;

expression
    : mult (( ADD^ | SUB^ ) mult)*
    ;

mult
   : atom (( MUL^ | DIV^) atom)*    
   ;

atom
   : INTEGER
   | (
       OPARN  expression CPARN -> expression
     )

     (
       OPARN  expression CPARN -> ^(MUL expression)+
     )*  
   ;


INTEGER : ('0'..'9')+ ;
WS  : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};

This grammar appears to output the correct AST Tree in ANTLRworks:

AST Output

I'm only just starting to get to grips with parsing and ANTLR, don't have much experience so feedback with really appreciated!

Thanks in advance! Carl

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
EraserheadIRL
  • 1,061
  • 2
  • 12
  • 20

1 Answers1

3

First of all, you did a great job given the fact that you've never used ANTLR before.

You can omit the language=Java and ASTLabelType=CommonTree, which are the default values. So you can just do:

options {
  output=AST;
}

Also, you don't have to specify the root node for each operator separately. So you don't have to do:

(ADD^ | SUB^)

but the following:

(ADD | SUB)^

will suffice. With only two operators, there's not much difference, but when implementing relational operators (>=, <=, > and <), the latter is a bit easier.

Now, for you AST: you'll probably want to create a binary tree: that way, all internal nodes are operators, and the leafs will be operands which makes the actual evaluating of your expressions much easier. To get a binary tree, you'll have to change your atom rule slightly:

atom
   : INTEGER
   | (
       OPARN  expression CPARN -> expression
     )
     (
       OPARN  e=expression CPARN -> ^(MUL $atom $e)
     )*  
   ;

which produces the following AST given the input "(2-3)(4+5)(6*7)":

enter image description here

(image produced by: graphviz-dev.appspot.com)

The DOT file was generated with the following test-class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    GLexer lexer = new GLexer(new ANTLRStringStream("(2-3)(4+5)(6*7)"));
    GParser parser = new GParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.start().getTree();
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Wow, thanks for your help Bart, that's super! I'll have to study that for a bit first, let it digest! – EraserheadIRL Jun 15 '11 at 18:08
  • @EraserHead IRL, yeah, I can imagine that. The rewrite rule `-> ^(MUL $atom $e)` might cause some headache. Realize that `$atom` refers to the entire rule that was matched, so the first `OPARN expression CPARN` is included because of it. – Bart Kiers Jun 15 '11 at 18:44
  • Got it now!, I think ANTLRworks and ANTLR IDE are tripping me up because I've noticed that some grammars wont run in those IDE's but work fine in Java, guess they themselves don't fully support ANTLR's grammar or there a bit buggy. – EraserheadIRL Jun 17 '11 at 08:45
  • @EraserHead IRL, yes, I only use ANTLRWorks to edit grammars, not for testing the parser since the interpreter is rather buggy. Best do it using a separate class, IMO. – Bart Kiers Jun 17 '11 at 08:58