2

Here is a basic structure for simple nested expressions...

infix   :   prefix (INFIX_OP^ prefix)*;

prefix  :   postfix | (PREFIX_OP postfix) -> ^(PREFIX_OP postfix);

postfix :   INT (POSTFIX_OP^)?;

POSTFIX_OP : '!';
INFIX_OP :  '+';
PREFIX_OP : '-';
INT :   '0'..'9'*;

If I wanted to create a list of these expressions I could use the following...

list:   infix (',' infix)*;

Here we use the ',' as a delimiter.

I want to be able to build a list of expressions without any delimiter.

So if I have the string 4 5 2+3 1 6 I would like to be able to interpret that as (4) (5) ^(+ 2 3) (1) (6)

The problem is that both 4 and 2+3 have the same first symbol (INT) so I have a conflict. I'm trying to figure out how I can resolve this.

EDIT

I've almost figured it out, just having trouble coming up with the correct rewrite for a certain condition...

expr: (a=atom -> $a)
(op='+' b=atom-> {$a.text != "+" && $b.text != "+"}? ^($op $expr $b) // infix
-> {$b.text != "+"}? // HAVING TROUBLE COMING UP WITH THIS CORRECT REWRITE!
-> $expr $op $b)*; // simple list

atom: INT | '+';
INT : '0'..'9'+;

This will parse 1+2+3++4+5+ as ^(+ ^(+ 1 2) 3) (+) (+) ^(+ 4 5) (+), which is what I want.

Now I'm trying to finish my rewrite rule so that ++1+2 will parse as (+) (+) ^(+ 1 2). Overall I want a list of tokens and to find all the infix expressions, and leave the rest as a list.

Manishearth
  • 14,882
  • 8
  • 59
  • 76
David James Ball
  • 903
  • 10
  • 26

1 Answers1

1

There's a problem with your INT rule:

INT : '0'..'9'*;

which matches an empty string. It should always match at least 1 char:

INT : '0'..'9'+;

Besides that, it seems to work just fine.

Given the grammar:

grammar T;

options {
  output=AST;
}

tokens {
  LIST;
}

parse      : list EOF -> list;
list       : infix+ -> ^(LIST infix+);
infix      : prefix (INFIX_OP^ prefix)*;
prefix     : postfix -> postfix
           | PREFIX_OP postfix -> ^(PREFIX_OP postfix)
           ;
postfix    : INT (POSTFIX_OP^)?;

POSTFIX_OP : '!';
INFIX_OP   : '+';
PREFIX_OP  : '-';
INT        : '0'..'9'+;
SPACE      : ' ' {skip();};

which parses the input:

4 5 2+3 1 6

into the following AST:

enter image description here

EDIT

Introducing operators that can both be used in post- and infix expressions will make your list ambiguous (well, in my version below, that is... :)). So, I'll keep the comma in there for this demo:

grammar T;

options {
  output=AST;
}

tokens {
  LIST;
  P_ADD;
}

parse        : list EOF -> list;
list         : expr (',' expr)* -> ^(LIST expr+);
expr         : postfix_expr;
postfix_expr : (infix_expr -> infix_expr) (ADD -> ^(P_ADD infix_expr))?;
infix_expr   : atom ((ADD | SUB)^ atom)*;
atom         : INT;

ADD   : '+';
SUB   : '-';
INT   : '0'..'9'+;
SPACE : ' ' {skip();};

In the grammar above, the + as an infix operator has precedence over the postfix-version, as you can see when parsing input like 2+5+:

enter image description here

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Bart, once again I must thank you for your endless contributions. Do you have any idea how to resolve a conflict when an infix operator and postfix operator have the same symbol? Because by my grammar, postfix binds most tightly then postfix always wins. So if '+' is shared by both infix and postfix then 2+5 will parse as ^(+ 2) (5) rather than ^(+ 2 5). How could I rewrite my grammar such that infix is the dominant rule if it can be parsed, otherwise fall back to postfix. – David James Ball Nov 10 '12 at 19:22
  • Hmm, that's not the tree I'm looking for. in 2+5+ ,the postfix operator should apply only to the (5), not ^(+ 2 5). – David James Ball Nov 10 '12 at 21:43
  • I thought you wanted to have the infix `+` to be the dominant operator... So, instead of `2+5+` being parsed as `^(P_ADD ^(+ 2 5))`, how would you like it being parsed instead? – Bart Kiers Nov 10 '12 at 21:52
  • Ok, I think I haven't explained myself very well. In this question I've abstracted what I thought was the problem. I'm actually parsing MathML (per my other question) and I'm just trying to figure out a nesting structure that works and takes into account the precedence and form of the operators. I have a list of operators and their fix form. Each operator is given a precedence number. In one example the '^' operator can be both infix and postfix. The infix form has precedence value 780 and the postfix form has value 880. http://www.w3.org/TR/MathML3/appendixc.html#oper-dict.entries-table – David James Ball Nov 10 '12 at 22:31
  • If I store a table of all the operators and their precedence values, can I use that somehow in the parser rule to give me the correct structure? – David James Ball Nov 10 '12 at 22:32
  • @DavidJamesBall, no, I don't see an easy way to make use of some lookup table to drive parser-decisions. I might have a closer look at MathML soon. If I do, and I get some (clever) ideas, I'll post back in your [original question about parsing MathML](http://stackoverflow.com/questions/13222862/parsing-mathml-operators-using-antlr). – Bart Kiers Nov 11 '12 at 13:15
  • Thanks again. As a side question do you know how a rule can be written such that a operator could become an atom or a binary expression depending on the context. Eg, 5+2 should become ^(+ 5 2) but +5+2 should become (+) ^(+5 2) and +5+2+ should become (+) ^(+ 5 2) (+). I only want to create an infix when there are 2 children. – David James Ball Nov 12 '12 at 21:16
  • I've almost figured it out, just having trouble coming up with a rewrite rule. See EDIT. – David James Ball Nov 13 '12 at 21:17