0

I am trying to build a complex ANTLR4 grammar for a language where one of its parts consists of a set of logical expressions. This is the part of my grammar which defines such expressions:

expression:
    '(' expression ')'                          # parenthesisExp
    | NOT expression                            # notExp
    | expression arithmetic_operator expression # arithmeticExp
    | expression relational_operator expression # relationalExp
    | expression AND expression                 # andExp
    | expression OR expression                  # orExp
    | expression logical_operator expression    # logicalExp
    | SPACE? (variable | number) SPACE?         # atom;

logical_operator: IFF | IMPLIES | REQUIRES | EXCLUDES;
arithmetic_operator: ADD | SUB | MULT | DIV | MOD | POW | ASIG;
relational_operator:
    HIGHER_THAN
    | LOWER_THAN
    | HIGHER_EQUAL_THAN
    | LOWER_EQUAL_THAN
    | EQUAL
    | DISTINCT;
number: INT | DOUBLE;
variable: WORD ('.' LOWERCASE)? | LOWERCASE;

//lexer rules (just the ones related to the expression rule)

//arithmetic operators  
ADD: '+';
SUB: '-';
MULT: '*';
DIV: '/';
MOD: '%';
POW: '^';
ASIG: '=';

//logical operators

AND: 'AND';
OR: 'OR';
NOT: 'NOT';
IFF: 'IFF';
IMPLIES: 'IMPLIES';
REQUIRES: 'REQUIRES';
EXCLUDES: 'EXCLUDES';

//relational operators
HIGHER_THAN: '>';
LOWER_THAN: '<';
HIGHER_EQUAL_THAN: '>=';
LOWER_EQUAL_THAN: '<=';
EQUAL: '==';
DISTINCT: '!=';

LOWERCASE: [a-z][a-z0-9]*;
WORD: [A-Z][a-zA-Z0-9]*;
INT: '0' | [1-9][0-9]*;
DOUBLE: [1-9][0-9]* '.' [0-9]+;

SPACE: (' ' | '\t')+;           //This manages optional space or tabs appearance as spaces do separate elements in other parts of the language.

I want variables or numbers to be the atoms, then parenthesis to have the highest priority, then NOT operators, followed by arithmetic, relational, AND, or and finally other logical expressions. Everything works well until parenthesis are used.

This expression:

A>3 AND B<5 IFF C OR D; //Expressions are closed by ;

Returns a correct output with everything parsed according to the priority established in the recursion. However, if I change it to this one:

(A>3 AND B<5 IFF C) OR D;

I get a "mismatched input 'OR' expecting ';'" error, which means the parser expects the expression to end at the parenthesis, it may do not recognize the parenthesis expression as an actual expression, so it followed by "OR D" doesn't match another expression.

Now trying with this:

A>3 AND (B<5 IFF C OR D);

Gives me a "extraneous input '(' expecting {LOWERCASE, WORD, INT, DOUBLE}" and "extraneous input ')' expecting {';', SPACE}".

I can't really figure out what is causing this, as nothing seems to be wrong regarding recursion definition.

pabpazjim
  • 13
  • 2
  • You're only allowing spaces around variables and numbers, so you're spaces before and after the parentheses are not valid according to your grammar. – sepp2k Aug 18 '21 at 10:04
  • @sepp2k Thanks it worked, having to deal with the eventual appearance of spaces everywhere in the file got me so overwhelmed I forgot to add them there. – pabpazjim Aug 18 '21 at 10:10
  • 1
    Why can't you just skip whitespace? – rici Aug 18 '21 at 11:35
  • @rici it's an old language I have to parse in order to serialize it into a new language. This old language contains one part where elements are separated by spaces(weird design choice), which forces me to deal with the appearance of optional spaces all over the file – pabpazjim Aug 19 '21 at 12:10
  • Shell is a language where "elements are separated by spaces" except in arithmetic contexts. Yet I have never felt the need to feed space characters into a shell parser. (Lexer contexts are easier). Perhaps there is a different question whose answer might be useful. – rici Aug 19 '21 at 15:31
  • Just because the language required space separation, doesn’t mean you have to deal with it. The question is: if you toss out the whitespace and parse it, would you get the correct interpretation of the input? (It doesn’t sound as though you need to identify errors where whitespace might have been omitted) – Mike Cargal Aug 19 '21 at 16:39

0 Answers0