I want to create a parser using antlr for the following string:
"1 AND (2 OR (3 AND 4)) AND 5" -> so i want to have AND and OR operations which should result in a tree after parsing was successful. this should result in the following tree:
AND
- 1
- OR
- 2
- AND
-3
-4
- 5
i also want to avoid unclear inputs like "1 AND 2 OR 3" as it is not clear how to construct the tree from that. And it also seems like the parser "accepts" input with trailing sings such as "1 AND 2asdf".
what i have so far is (not working as expected):
grammar code;
options {
language=CSharp3;
output=AST;
ASTLabelType=CommonTree;
//backtrack=true;
}
tokens {
ROOT;
}
@rulecatch {
catch {
throw;
}
}
@parser::namespace { Web.DealerNet.Areas.QueryBuilder.Parser }
@lexer::namespace { Web.DealerNet.Areas.QueryBuilder.Parser }
@lexer::members {
public override void ReportError(RecognitionException e) {
throw e;
}
}
public parse : exp EOF -> ^(ROOT exp);
exp
: atom
( And^ atom (And! atom)*
| Or^ atom (Or! atom)*
)?
;
atom
: Number
| '(' exp ')' -> exp
;
Number
: ('0'..'9')+
;
And
: 'AND' | 'and'
;
Or
: 'OR' | 'or'
;
WS : (' '|'\t'|'\f'|'\n'|'\r')+{ Skip(); };
Hope someone of you guys can help me get on the right track!
edit: and how can i archieve "1 AND 2 AND 3" to result in
AND
1
2
3
instead of
AND
AND
1
2
3
EDIT:
thanks for the great solution, it works like a charm except for one thing: when i call the parse() method on the following term "1 AND (2 OR (1 AND 3) AND 4" (closing bracket missing) the parser still accepts the input as valid.
this is my code so far: grammar code;
options {
language=CSharp3;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ROOT;
}
@rulecatch {
catch {
throw;
}
}
@lexer::members {
public override void ReportError(RecognitionException e) {
throw e;
}
}
public parse
: exp -> ^(ROOT exp)
;
exp
: atom
( And^ atom (And! atom)*
| Or^ atom (Or! atom)*
)?
;
atom
: Number
| '(' exp ')' -> exp
;
Number
: ('0'..'9')+
;
And
: 'AND' | 'and'
;
Or
: 'OR' | 'or'
;
WS : (' '|'\t'|'\f'|'\n'|'\r')+{ Skip(); };
edit2: i just found another problem with my grammar: when i have input like "1 AND 2 OR 3" the grammar gets parsed just fine, but it should fail because either the "1 AND 2" needs to be inside brackets or the "2 OR 3" part. i dont understand why the parser runs through as in my opinion this grammar should really cover that case. is there any sort of online-testing-environment or so to find the problem? (i tried antlrWorks but the errors given there dont lead me anywhere...)
edit3: updated the code to represent the new grammar like suggested.
i still have the same problem that the following grammar:
public parse : exp EOF -> ^(ROOT exp);
doesnt parse to the end.. the generated c# sources seem to just ignore the EOF... can you provide any further guidance on how i could resolve the issue?
edit4 i still have the same problem that the following grammar:
public parse : exp EOF -> ^(ROOT exp);
doesnt parse to the end.. the generated c# sources seem to just ignore the EOF... can you provide any further guidance on how i could resolve the issue?
the problem seems to be in this part of the code:
EOF2=(IToken)Match(input,EOF,Follow._EOF_in_parse97);
stream_EOF.Add(EOF2);
When i add the following code (just a hack) it works...
if (EOF2.Text == "<missing EOF>") {
throw new Exception(EOF2.Text);
}
can i change anything so the parser gets generated correclty from the start?