I need to parse user input that defines queries to a system. The heart of such queries are triplets which can also be combined to form complex queries (the idea is to restrict a result set to only show entries which satisfy these queries). Here are 3 sample inputs:
field1 = simpleValueNoQuotes
field2 ~ "valueWithQuotes"
(field1 = simpleValueNoQuotes OR field2 ~ "valueWithQuotes") AND field3 = foobar
The user must use quoted values if their values contain any reserved characters like doublequotes or parentheses as well as whitespace.
So far, my grammar has handled this well enough, but now a new requirement has come up. Users should be allowed to omit the spaces, entering queries like field1=simpleValueNoQuotes
. My grammar can't handle this and I can't seem to figure out why (this is my first project with antlr).
Here is my grammar in a slightly simplified form:
grammar simple;
querytree : query EOF;
query : subquery (operator subquery)* ;
subquery : leaf | composite;
operator : 'and' | 'or';
leaf : fieldname comparison value;
value : DOUBLEQUOTE_DELIMITED_VALUE | SIMPLE_VALUE;
composite : leftParenthesis query rightParenthesis;
fieldname : 'field1' | 'field2'; //this has many keywords in reality
comparison : '=' | '~';
leftParenthesis : '(';
rightParenthesis : ')';
fragment
ESCAPE : '\\' ( '"' | '\\') ;
DOUBLEQUOTE_DELIMITED_VALUE
: '"' ( ~( '"' | '\\' ) | ESCAPE )* '"'
;
SIMPLE_VALUE
: ('\u0021'|'\u0023'..'\u0027'|'\u002A'..'\u007E'|'\u00A1'..'\uFFFF')*; /*all unicode characters except control characters, doublequotes, parentheses and whitespace defined below*/
WHITESPACE
: ('\u0009'|'\u000A'|'\u000C'|'\u000D'|'\u0020'|'\u00A0')+ {$channel = HIDDEN;} /*\t, \n, \f, \r, space, nonbreaking space*/
;
Any ideas as to why this is able to parse field1 = simpleValueNoQuotes
but unable to parse field1=simpleValueNoQuotes
?