0

I'm trying to create a grammar for expression evaluation that differentiates between String, Boolean, and Numeric expressions.

Here's the relevant grammar so far:

functionInvocation: IDENTIFIER '(' ( IDENTIFIER '=' expression ( ',' IDENTIFIER '=' expression )* )? ')' ;

expression
    : stringExpression 
    | booleanExpression 
    | numericExpression
    ;

stringExpression
    : '(' stringExpression ')'
    | stringExpression '+' stringExpression
    | <assoc=right> booleanExpression '?' stringExpression ':' stringExpression 
    | STRING_LITERAL
    | IDENTIFIER
    | functionInvocation
    ;

booleanExpression
    : '(' booleanExpression ')'
    | NOT booleanExpression
    | booleanExpression LOGICAL_AND booleanExpression
    | booleanExpression LOGICAL_OR booleanExpression
    | numericExpression ( LESS_THAN | LESS_THAN_OR_EQUAL_TO | GREATER_THAN | GREATER_THAN_OR_EQUAL_TO ) numericExpression
    | stringExpression ( LESS_THAN | LESS_THAN_OR_EQUAL_TO | GREATER_THAN | GREATER_THAN_OR_EQUAL_TO ) stringExpression
    | numericExpression ( IS_EQUAL_TO | NOT_EQUAL_TO ) numericExpression
    | stringExpression ( IS_EQUAL_TO | NOT_EQUAL_TO ) stringExpression
    | <assoc=right> booleanExpression '?' booleanExpression ':' booleanExpression
    | stringExpression '=~' 'm/' REGEX_PATTERN '/' REGEX_OPTIONS
    | BOOLEAN_TRUE
    | BOOLEAN_FALSE
    | IDENTIFIER
    | functionInvocation 
    ;

numericExpression 
    : '(' numericExpression ')' #mathSubExpression
    | ( PLUS | MINUS | NOT )? mathAtom #unaryOperation
    | numericExpression POWER numericExpression #binaryOperation
    | numericExpression ( MULTIPLY | DIVIDE | MODULO ) numericExpression #binaryOperation
    | numericExpression ( ADD | SUBTRACT ) numericExpression #binaryOperation
    | numericExpression ( SHIFT_LEFT | SHIFT_RIGHT ) numericExpression #binaryOperation
    | numericExpression BITWISE_AND numericExpression #binaryOperation
    | numericExpression BITWISE_XOR numericExpression #binaryOperation
    | numericExpression BITWISE_OR numericExpression #binaryOperation
    ;
mathAtom
    : BINARY_LITERAL
    | FLOAT_LITERAL
    | HEX_LITERAL
    | INTEGER_LITERAL
    | OCTAL_LITERAL
    | SCIENTIFIC_NOTATION_LITERAL
    | IDENTIFIER
    | functionInvocation
    ;

I'm ending up with stringExpression and booleanExpression being mutually left-recursive because of the ternary operator in stringExpression.

Is there a way of expressing the intent that is not mutually left-recursive?

John Arrowwood
  • 2,370
  • 2
  • 21
  • 32
  • 1
    I'd make just a single expression rule where the parser accepts expressions like `123 + (a OR b)` and at a later stage (in a visitor or listener) perform the necessary type checks. After all, an expression that includes a `functionInvocation` can either return a boolean or a numeric type and is also not handled in the grammar. Generally the parser's job is only syntax. Semantics (like type checking) is done after parsing. Also see: https://stackoverflow.com/questions/51775995/where-types-should-be-checked-in-antlr-grammar-or-in-the-visitor – Bart Kiers Feb 20 '23 at 16:07
  • That's what I had originally, but there are bits of grammar (not shown) which specifically only take a string expression or a numeric expression, and just having a generic expression made for some very strange, very illegal things – John Arrowwood Feb 20 '23 at 18:06
  • They're not illegal if you check for them in a 2nd pass, after parsing (at least, they're not if you break off then, and return an error message). – Bart Kiers Feb 20 '23 at 18:21
  • The indirect left recursion is cause by both stringExpression and booleanExpression, not just the `?:` operator in stringExpression, because there are alts in stringExpression that start with booleanExpression, and alts in booleanExpression that start with stringExpression. Anyway, the easiest "fix" is to make stringExpression to not be left recursive since that has the least number of alts to substitute into the `?:` operator. Note, `ID + ID` is both a string and numeric expression. So, even if you fix the indirect left recursion, it will likely have terrible performance. – kaby76 Feb 21 '23 at 01:21
  • And also be aware that `m/` will not be matched for an expression like `m/2` because the `m/` will be tokenized as a single (regex related) token. It will *not* become 2 separate tokens (identifier and division operator). Have a look at this ECMAScript grammar to see how that can be solved: https://github.com/antlr/grammars-v4/blob/master/javascript/ecmascript/ECMAScript.g4#L780 – Bart Kiers Feb 21 '23 at 07:50

0 Answers0