3

I have already looked at this question and even though the question titles seem to be the same; it doesn't answer my question, at least not in any way that I can understand.

Parsing Math

Here is what I am parsing:

PI -> 3.14.
Number area(Number radius) -> PI * radius^2.

This is how I want my AST tree to look, minus all the useless root nodes.

how it should look http://vertigrated.com/images/How%20I%20want%20the%20tree%20to%20look.png

Here are what I hope are the relevant fragments of my grammar:

term : '(' expression ')'
     | number -> ^(NUMBER number)
     | (function_invocation)=> function_invocation 
     | ATOM
     | ID
     ;

power : term ('^' term)* -> ^(POWER term (term)* ) ;
unary : ('+'! | '-'^)* power ;
multiply : unary ('*' unary)* -> ^(MULTIPLY unary (unary)* ) ;
divide : multiply ('/' multiply)* -> ^(DIVIDE multiply (multiply)* );
modulo : divide ('%' divide)* -> ^(MODULO divide (divide)*) ;
subtract : modulo ('-' modulo)* -> ^(SUBTRACT modulo (modulo)* ) ;  
add : subtract ('+' subtract)* -> ^(ADDITION subtract (subtract)*) ;

relation : add (('=' | '!=' | '<' | '<=' | '>=' | '>') add)* ;

expression : relation (and_or relation)*
           | string  
           | container_access
           ;
and_or : '&' | '|' ;

Precedence

I still want to keep the precedence as illustrated in the following diagrams, but want to eliminate the useless nodes if at all possible.

Source: Number a(x) -> 0 - 1 + 2 * 3 / 4 % 5 ^ 6.

Here are the nodes I want to eliminate:

how I want the precedence tree to look http://vertigrated.com/images/example%202%20desired%20result.png

Basically I want to eliminate any of those nodes that don't directly have a branch under them to binary options.

Community
  • 1
  • 1

4 Answers4

2

Your rule (and other like it)

 add : subtract ('+' subtract)* -> ^(ADDITION subtract (subtract)*) ;

produces the useless production when you don't have a sequence of add operations.

I'm not an ANTLR expert, but I'd guess you need two cases, one for an add term that is unary, and one for a set of children, the first of which generates your standard tree, and the second of which simply passes the child tree up to the parent, without creating a new node?

add : subtract ( ('+' subtract)+ -> ^(ADDITION subtract (subtract)*) 
               | -> subtract ) ;

Similar changes for other rules with sequences of operands to an operator.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • With the syntax fix I made to your answer this worked like a charm! Honestly I don't actually understand the syntax or how it works, but it does work. –  Nov 16 '12 at 06:06
  • 1
    What it says is, "if you find a single/unary operand at one level of precedence in your grammar, then simply pass the tree for that single operand to your parent; if you find two or more operands at a single precedence level in your grammar, then then an operator node for that precedence level and insert the operands as children". – Ira Baxter Nov 16 '12 at 06:43
2

You must realize that the two rules:

add : sub ( ('+' sub)+ -> ^(ADD sub (sub)*) | -> sub ) ;

and

add : sub ('+'^ sub)* ;

do not produce the same AST. Given the input 1+2+3, the first rule will produce:

  ADD
   |
.--+--.
|  |  |
1  2  3

where the second rule produces:

     (+)
      |
   .--+--.  
   |     |
  (+)    3
   |
.--+--.
|     |
1     2

The latter makes more sense: infix expressions are expected to have 2 child nodes, not more.

Why not simply remove the literals in your parser rules and just do:

add : sub (ADD^ sub)*;

ADD : '+';

Creating the same AST using a rewrite rule would look like this:

add : (sub -> sub) ('+' s=sub -> ^(ADD $add $s))*;

Also see chapter 7: Tree Construction from The Definitive ANTLR Reference. Especially the paragraphs Rewrite Rules in Subrules (page 173) and Referencing Previous Rule ASTs in Rewrite Rules (page 174/175).

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • not using the re-write rules puts the literals in my AST, just out of OCDness I want the word `ADD` instead of `+`. –  Nov 16 '12 at 13:29
  • In an AST, one usually wants all the operands of a associative/commutative operator as children. (They really are being "ADD"ed). You probably want to implement non-commutative operators such as subtract to binary-only. – Ira Baxter Nov 16 '12 at 15:56
0

Even though I accepted Barts's answers as correct, I wanted to post my own complete answer with example code that I got working just for completeness.

Here is what I did based on Bart's answer:

unary : ('+'! | '-'^)? term ;
pow : (unary -> unary) ('^' s=unary -> ^(POWER $pow $s))*;
mod : (pow -> pow) ('%' s=pow -> ^(MODULO $mod $s))*;
mult : (mod -> mod) ('*' s=mod -> ^(MULTIPLY $mult $s))*;
div : (mult -> mult) ('/' s=mult -> ^(DIVIDE $div $s))*;
sub : (div -> div) ('-' s=div -> ^(SUBTRACT $sub $s))*;
add : (sub -> sub) ('+' s=sub -> ^(ADD $add $s))*;

And here is what the resulting tree looks like:

working answer http://vertigrated.com/images/working_answer.png

There is an alternative solution to just not use the rewrites and promote the symbols themselves to roots, but I want all descriptive labels in my tree if at all possible. I am just being anal about how the tree is represented so that my tree walking code will be as clean as possible!

power : unary ('^'^ unary)* ;
mod : power ('%'^ power)* ;
mult : mod ('*'^ mod)* ;
div : mult ('/'^ mult)* ;
sub : div ('-'^ div)* ;
add : sub ('+'^ sub)* ;

And this looks like this:

without rewrites http://vertigrated.com/images/without_the_rewrites.png

  • The "alternate" method is preferable in every way. The generated code is much faster, much smaller (both code size and memory usage at runtime), and unlike the version using rewrites it actually includes source information for the operators. If you have an issue with labeling, you should create your own `toStringTree()` method or similar, because the rewrites you decided on are doing nothing but holding you back. – Sam Harwell Nov 16 '12 at 13:38
  • @280Z28 I appreciate your input, and know that I am doing a much more complicated solution, I had it working with my *alternative*, but I have my reasons ( mostly I am trying to learn all the dark corners of ANTLR ). –  Nov 16 '12 at 15:22
  • That's fine. I just wanted to make it clear that you are not talking about two equivalent solutions to the same problem. One of the solutions is superior to the other in every single metric, and (based on responses to this question) it's apparently not the one people think. – Sam Harwell Nov 16 '12 at 16:08
0

To get rid of the irrelevant nodes, just be explicit:

 subtract
    :
    modulo
    ( 
       ( '-' modulo)+  -> ^(SUBTRACT modulo+) // no need for parenthesis or asterisk
       |
      () -> modulo
    )
    ;
Apalala
  • 9,017
  • 3
  • 30
  • 48
  • I don't see how this answer is different from Ira's. But, like I previously mentioned, this potentially creates an AST like `^(SUBTRACT 1 2 3)` (an infix expression with more than 2 children...). – Bart Kiers Nov 16 '12 at 15:20