2

I want to create a simple criteria expression parser with antlr3

Updated: separate AND OR expression rules to support AND/OR different hierarchy, but got another problems: if the expression is something like: a = 1 and b = 2 and c = 3 The tree should be as following according to current implement:

       =      =
 (a = 1)(b = 2)(c = 3)
But I want to generate it as follows:
          =       =
    (a = 1)(b = 2)
               (c = 3)
First "and" should be higher priority than another, because I want to parse all the expression as left exp and right exp.

I think I need to re-write the rule in the "subcond" To make a = 1 and b = 2 and c = 3 -> (a = 1 and b = 2) and c = 3

but tried many times with no luck. Has anybody got an idea how to achieve it? Thanks.


My goal is to parse some kind of SQL where clause style sentence, and build a AST to walk through.

For example:

    a = 1 and (b = 2 or c = 3)            //This one can parse correctly.
    a = 1 and ((b = 2 or c = 3) or d = 4) //This one cannot parse correctly, missing last d = 4 in the tree. 
                                          //Tree is not correct.

My current grammar file cannot parse above complex condition. For I'm newbie for antlr, not sure how to modify my grammar to achieve above approach more correctly. Can someone help on this? !Any suggestions or comments are appreciate.

and my grammar as follows (Updated according to the comments. Warning issue resolved.):

grammar CriteriaExpression;

options {
  output       = AST;
  ASTLabelType = CommonTree;
  language     = Java;
}

tokens {
  AND    = 'and';
  OR     = 'or';
  LPAREN = '(';
  RPAREN = ')';
}

@lexer::header {
package com.antlr;
}

@parser::header {
package com.antlr;
}

eval
:
expression
;

expression : andExp (OR^ andExp)* ;

andExp : subcond (AND^ subcond)* ;

subcond : LPAREN expression RPAREN |atom ;

atom
  :
  EXPR OPERATOR EXPR
  ;

OPERATOR
  :
  '='| '<>'| '!='| '<='| '!>'| '<'| '>='| '!<'| '>'| 'like'
  ;

EXPR
  :
  ('a'..'z'| 'A'..'Z'| '0'..'9')+
  ;

 WILDCARD
  :
  '%'
  ;

WS
  :
  ('\t'| ' '| '\r'| '\n'| '\u000C')*
   {$channel = HIDDEN;}
  ;

((a=1)) ((a=1))

a = 1 and ((b = 2 or c = 3) or d = 4) a = 1 and ((b = 2 or c = 3) or d = 4)

phyerbarte
  • 199
  • 3
  • 12
  • show incoming file for your example – Aliaksei Bulhak Feb 20 '13 at 08:04
  • Hi, @Aleksei Bulgak, what is your mean incoming file? The example is just possible value I think about. The real input string could be much more complex and could be mix combination. Thanks. – phyerbarte Feb 20 '13 at 08:15
  • you give this link in-complete tree. with tree – Aliaksei Bulhak Feb 20 '13 at 08:16
  • @Aleksei Bulgak, the in-complete tree is generated by antlrworks 1.4.3 with the example a = 1 and ((b = 2 or c = 3) or d = 4), I think the tree is missing the last part d=4 and not sure how to fix it. – phyerbarte Feb 20 '13 at 08:22
  • Instead of completely rewriting your already answered question, please create a new one. Or is your original question not answered? – Bart Kiers Feb 22 '13 at 13:08
  • Hi @Bart Kiers, the orignal question already answered, currently problems is base on the orignal question. I will create a new question. – phyerbarte Feb 22 '13 at 13:16

2 Answers2

2

May be I'm wrong butI think you problem connected with this thing LPAREN* something RPAREN* you can write comething like this ((something ) and antlr think that this write because LParent and Rparent have not connected to each other so may be use something like this

COMPLEX:
    LPARENT (COMPLEX|subcond) RPARENT;

But I will say it again, maybe I'm wrong

UPDATE

change this:

subcond
  : 
  //atom (( AND | OR )^ atom)*
  LPAREN* atom RPAREN*
  ;

to this:

subcond
  : 
  LPAREN (subcond|atom) RPAREN
  ;

using this you can now write something like this ((a=1))

Aliaksei Bulhak
  • 6,078
  • 8
  • 45
  • 75
  • Great!, your answer open my mind. Which can resolve the warning. But still I think the tree priority for the parentheses still not correctly. – phyerbarte Feb 20 '13 at 08:39
  • After update, the tree can be generated completely, but still cannot describe parentheses priority.That should be another problem. – phyerbarte Feb 20 '13 at 08:55
  • Thanks @Aleksei Bulgak, your updated can meet ((a=1)), but the grammar seems not flex enough. Check my updated grammar, I think it is correct, now. – phyerbarte Feb 20 '13 at 13:33
2

One flaw in your grammar is the rule

expression
  :
  LPAREN* subcond RPAREN* (( AND | OR )^ LPAREN* subcond RPAREN*)
  ;

Since you can have any number of LPAREN or RPAREN, there is no guarantee they are matched. I suggest using somehting like

expression
  : subcond (( AND | OR ) subcond)?
  | subcond
  ;

and for subcond

subcond
  : atom (( AND | OR )^ atom)*
  | LPAREN expression RPAREN
  ;

Ideally, you should also have separate rules for AND and OR expressions to have the correct precedence in your parse tree.

Update: In your updated grammar, again you are using LPAREN* and RPAREN* which won't give you properly balanced trees. You need to model multiple parens like ((a = 1)) with recursion, like I described in my example above. This would give a tree like

((a = 1))
  ^---^--- ATOM
 ^-----^-- Subcond -> Expression
^-------^- Subcond -> Expression

So the tree would be like that:

Expression "((a = 1))"
^
Subcond "(a = 1)"
^
Expression "(a = 1)"
^
Subcond "a = 1"
^
ATOM "a = 1"
tehlexx
  • 2,821
  • 16
  • 29
  • Hi, @tehlexx, yes, you are right, we cannot guarantee the number of LPAREN or RPAREN in the input string. But I think the key point is this grammar cannot describe the parentheses priority, isn't it? Maybe I'm wrong, I updated my grammar, it can generated complete tree with the complex example: a = 1 and ((b = 2 or c = 3) or d = 4), but I'm not sure if the priority in this tree is correct or not. – phyerbarte Feb 20 '13 at 09:03
  • The way I understood your question is that there must be the `*PAREN`s must be matched, so if there are 2 `LPAREN`s there must be exactly 2 `RPAREN`s. The above grammar should also cover the scenario that there are no `PAREN`s at all, it will then take the direct path `expression -> subcond -> ATOM`, so this should work as well. The key is to model the parentheses with recursion, and not with `PAREN*`. – tehlexx Feb 20 '13 at 09:10
  • Hi, @tehlexx, I got your point, and updated grammar, I think the approach should be correct, now. Very grateful. The generated tree with ((a=1)) seems deeper than your described, is my approach meet your description now? Can you help to confirm? Thanks. – phyerbarte Feb 20 '13 at 13:30
  • As far as I can tell, the trees look good! What you should also check though is the tree for `a=1 OR b=2 AND c=3` and `a=1 AND b=2 OR c=3`. In both cases the `AND` nodes should be below the `OR` nodes. This is important, as `AND` usually precedes `OR`. – tehlexx Feb 20 '13 at 14:14
  • Hi @tehlexx many thanks, I updated my grammar according to your suggestion. Now it is support AND precedes OR. But got new problem, since I want to parse all expression into two part left exp and right exp, so if the expression is something like: a=1 and b=2 and c=3, than the tree would have 3 leaps, that will be a problem when I parser the tree, is that possible to re-write the rule to make it support a=1 and b=2 and c=3 -> (a=1 and b=2) and c=3, even more ((a=1 and b=2) and c=3) and d=4 ? Maybe it's not possible..thanks anyway. – phyerbarte Feb 22 '13 at 12:56
  • I think `a=1 and b=2 and c=3` is the same as `(a=1 and b=2) and c=3` or `a=1 and(b=2 and c=3)`, so it shouldn't matter in which order you are evaluating the parse tree. I'm not sure if I understand what you mean correctly though, so more context would probably help. Maybe it's appropriate to raise a new question. – tehlexx Feb 22 '13 at 14:09