1

I am new to ANTLR and working on a parser to parse SAS code which mainly comprises of if then else if statements. I have created the following grammar to parse the code but I am getting error in Intellij when I tried to run using sample application.

Grammar created :

grammar SASDTModel;

parse
 : if_block+
 | score_block
 ;

//Model
// : If_block+
// | Score_block
// ;

if_block
 : (if_statement|if_in_block)
 | else_if_statement+
 | else_statement
 ;

if_statement
 : IF '(' if_condition ')' THEN Identifier'='Value ';'
 | IF Identifier'='Value THEN Identifier'='Value ';'
 ;
else_if_statement
 : ELSEIF '(' if_condition ')' THEN Identifier'='Value ';'
 | ELSEIF Identifier'='Value THEN Identifier'='Value ';'
 ;

if_condition
 : Value ComparisionOperators Identifier ComparisionOperators Value
 | Value ComparisionOperators Value;


else_statement
 : ELSE Identifier'='Value ';'
 ;

if_in_block
 : IF Identifier IN '(' StringArray ')' THEN Identifier'='Value ';'
 ;

score_block
 : Identifier'='Arithmetic_expression ';'
 ;

Arithmetic_expression:
 | ( ArithmeticOperators '(' Value ')' )+
 | ( ArithmeticOperators '(' Value ArithmeticOperators Identifier ')' )+
 ;
WS : ( ' ' | '\t' | '\r' | '\n' )-> channel(HIDDEN);
//WS : [ \t\n\r]+ -> channel(HIDDEN) ;
//WS : (' ' | '\t')+ -> channel(HIDDEN);
//COMMENT    :   '/*' .*? '*/'    -> skip ;
//LINE_COMMENT    :   '*' ~[\r\n]* -> skip ;

ArithmeticOperators:
 | '+'
 | '-'
 | '*'
 | '/'
 | '**'
 ;

ComparisionOperators
 : '=='
 | '<'
 | '>'
 | '<='
 | '>='
 ;

IF: 'IF' | 'if' ;
ELSE: 'ELSE' | 'else' ;
ELSEIF: 'ELSE IF' | 'else if' ;
THEN: 'THEN' | 'then';
IN: 'IN' | 'in';


Value : INT
 | DOUBLE
 | '-'DOUBLE
 | '-'INT
 | Identifier
 |'null';


INT : [0-9];
DOUBLE : INT+ PT INT+
    | PT INT+
    | INT+
    ;
PT  : '.';

Identifier  : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*  ;

StringArray : (('\'')(Value)('\''))+; 

Input:

if  scored = null then  scored = -0.05;
else if ( 0 <  scored <= 300 ) then scored = -0.5;
else if ( 300 < scored <= 500 ) then scored = -0.4;
else if ( 500 < scored <= 800 ) then scored = -0.8;
else if ( 800 < scored <= 1000 ) then  scored =  0.9;
else if ( scored > 1000 ) then  scored =  1.735409628;
else scored = 0;

Error I am getting

line 1:4 no viable alternative at input 'IF  scored'
line 1:61 mismatched input '<=' expecting ')'
line 1:112 mismatched input '<=' expecting ')'
line 1:163 mismatched input '<=' expecting ')'
line 1:214 mismatched input '<=' expecting ')'
line 1:276 mismatched input 'scored' expecting Identifier
line 1:303 mismatched input 'scored' expecting Identifier

All the error codes are 1: since I am preprocessing the SAS code and removing any comments and converting into single line.

So after preprocessing the input is converted to following : `

IF scored = null THEN scored = -0.05;ELSE IF ( 0 < scored <= 300 ) THEN scored = -0.5;ELSE IF ( 300 < scored <= 500 ) THEN scored = -0.4;ELSE IF ( 500 < scored <= 800 ) THEN scored = -0.8;ELSE IF ( 800 < scored <= 1000 ) THEN scored = 0.9;ELSE IF ( scored > 1000 ) THEN scored = 1.735409628;ELSE scored = 0;

`

Ajay Sant
  • 665
  • 1
  • 11
  • 21
  • Missing parantheses perhaps? `if (scored = null) ...` – Seelenvirtuose Feb 01 '18 at 12:04
  • I have a second rule for the scenario without the parenthesis so I think it should still be able to match it. **IF Identifier'='Value THEN Identifier'='Value ';'** Kindly correct me if I am wrong in my understanding. – Ajay Sant Feb 01 '18 at 12:17
  • Hmm. You are correct. But maybe try it out to see whether you get a different behavior. – Seelenvirtuose Feb 01 '18 at 12:25
  • So after I added parenthesis the error did change to **line 1:27 mismatched input 'scored' expecting Identifier** Though still confused over the new error too. – Ajay Sant Feb 01 '18 at 12:49
  • This means that the parser did not recognize the second alternative in your `if_statement` . I am not familiar enough with ANTLR to help you. But you now maybe have some ideas how to step through. – Seelenvirtuose Feb 01 '18 at 13:49

3 Answers3

3

Here are a couple of things that might causing problems:

  • by making StringArray : (('\'')(Value)('\''))+; a lexer rule, you will only match 'foo123mu' (values without spaces). You should make StringArray a parser rule (and then Value should also become a parser rule)
  • your else If rule: ELSEIF: 'ELSE IF' | 'else if' ; is rather fragile: whenever there are 2 or more spaces between ELSE and IF, your rule will not be matched. You should remove this rule an use the existing ELSE and IF rules in your parser rule(s)
  • the rules ArithmeticOperators and Arithmetic_expression match empry strings. Lexer rules must never match empty strings (the lever can produce an infinite amount of empty-string tokens)
  • the lever rule Arithmetic_expression should be a parser rule: whenever lever rules are used to "glue" other tokens to each other, you should "promote" them to parser rules
  • your naming convention for lexer rules in inconsistent: use either PascalCasse, or UPPER_CASE, not both
  • as already mentioned, INT : [0-9]; should be INT : [0-9]+; otherwise 4 would be tokenised as an INT and 42 as a DOUBLE

These are just a few of the things I saw while reading your question, so there may be more things incorrect. I suggest you first take the time to learn a bit more ANTLR before trying to write a SAS grammar. Or, better yet, try to find an existing (ANTLR) grammar for this language instead of writing your own.

Here's an existing one you could take a look at: https://github.com/xueqilsj/sas-grammar (no idea how accurate it is)

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • For Debugging purposes I am using your blog "https://medium.com/@bkiers/debugging-antlr-4-grammars-58df104de5f6", From it I have taken the code to print the tokens and parsing tree. I am able to print the Lexer Tokens but getting some issue in printing parsing tree: SASDTModelParser testParser = new SASDTModelParser(tokens); ParserRuleContext context = testParser. parser.${4}(); String tree = context.toStringTree(parser); printPrettyLispTree(tree); In the above code I am getting error in "testParser. parser.${4}();" can you help with the corresponding Java code. Thanks – Ajay Sant Feb 06 '18 at 12:00
  • I solved the above problem as follows : SASDTModelParser testParser = new SASDTModelParser(tokens); ParserRuleContext context = testParser.parse(); String tree = context.toStringTree(testParser); printPrettyLispTree(tree); – Ajay Sant Feb 06 '18 at 16:23
  • 1
    @Ajay I just tested my bash script, and I have no problem running & debugging a test grammar. I'm guessing you didn't use my script as I outlined on Medium, but copy pasted parts of it. Anyway, good to hear you got things working. – Bart Kiers Feb 06 '18 at 17:10
1

The syntax of your input is incorrect: == should be used instead of =.

UPDATE:

Also, although the syntax of INT and DOUBLE should work, it would be better expressed like so:

INT : [0-9]+;
DOUBLE : INT PT INT
    | PT INT
    | INT
    ;

otherwise, 300 would be identified as a DOUBLE, not as an INT.

UPDATE 2

As @Raven has commented:

INT : [0-9]+;
DOUBLE : INT PT INT
    | PT INT
    ;
Maurice Perry
  • 9,261
  • 2
  • 12
  • 24
  • I have replaced = with == in input but I am still getting the same error. "line 1:3 no viable alternative at input 'IF scored'" The column number has changed since I deleted an extra space between IF and scored. – Ajay Sant Feb 01 '18 at 11:53
  • 1
    You could remove the last `INT` part from the `DOUBLE` rule as it will never be matched (such content would directly be matched as `INT`) – Raven Feb 01 '18 at 16:42
0

I have completed my grammar and resolved all the errors thanks to @Bart, @Seelenvirtuose and @Maurice.

Following is the ANTLR grammar for parsing SAS If Else and simple Assignment statements.

grammar SASDTModel;

parse : block+ EOF;

block
 : if_block+                # oneOrMoreIfBlock
 | assignment_block+        # assignmentBlocks
 ;

if_block
 : if_statement (else_if_statement)* else_statement?
 ;

/*nested_if_else_statement
 : If if_condition Then Do? ';'? if_statement (else_if_statement)* else_statement? End? ';'?
 ;*/

if_statement
 : If '('? if_condition ')'? Then if_block                                          # nestedIfStatement
 | If '('? if_condition ')'? Then expression Equal expression ';'                   # ifStatement
 | If expression In '(' expression_list+ ')' Then expression Equal expression ';'   # ifInBlock
 ;

else_if_statement
 : Else If '('? if_condition ')'? Then expression Equal expression ';'                  # elseIf
 | Else If expression In '(' expression_list+ ')' Then expression Equal expression ';'  # elseIfInBlock
 ;

if_condition
 : Identifier (Equal|ComparisionOperators) Quote? expression+ Quote?    # equalCondition
 | expression                                                           # expressionCondition
 | expression equals_to_null                                            # checkIfNull
 | expression op=(And|Or) expression                                    # andOrExpression
 ;

/*if_range_condition
 : expression ComparisionOperators expression ComparisionOperators expression
 ;*/

else_statement
 : Else expression Equal expression ';'
 ;

assignment_block
 : Identifier Equal Identifier '(' function_parameter ')' ';'   # functionCall
 | Identifier Equal expression expression* ';'                  # assignValue
 ;


expression
 : Value                                                                                                # value
 | Identifier                                                                                           # identifier
 | SignedFloat                                                                                          # signedFloat
 | '(' expression ')'                                                                                   # expressionBracket
 | expression '(' expression_list? ')'                                                                  # expressionBracketList
 | Not expression                                                                                       # notExpression
 | expression (Min|Max) expression                                                                      # minMaxExpression
 | expression op=('*'|'/') expression                                                                   # mulDivideExpression
 | expression op=('+'|'-') expression                                                                   # addSubtractExpression
 | expression ('||' | '!!' ) expression                                                                 # orOperatorExpression
 | expression ComparisionOperators expression ComparisionOperators expression                           # inRangeExpression
 | expression ComparisionOperators Quote? expression+ Quote?                                            # ifPlainCondition
 | expression (Equal|ComparisionOperators) Quote {_input.get(_input.index() -1).getType() == WS}? Quote # ifSpaceStringCondition
 | expression Equal expression                                                                          # equalExpression
 ;

expression_list
 : Quote? expression+ Quote? Comma?                                                # generalExpressionList
 | Quote ({_input.get(_input.index() -1).getType() == WS}?)? Quote Comma?          # spaceString
 ;

function_parameter
 : expression+
 ;

equals_to_null : Equal Pt ;

/*ArithmeticOperators
 : '+'
 | '-'
 | '*'
 | '/'
 | '**'
 ;*/

Equal : '=' ;

ComparisionOperators
 : '<'
 | '>'
 | '<='
 | '>='
 ;

And : '&' | 'and';

Or
 : '|'
 | '!'
 ;

Not
 : '^'
 | '~'
 ;

Min : '><';
Max : '<>';

If
 : 'IF'
 | 'if'
 | 'If'
 ;

Else
 : 'ELSE'
 | 'else'
 | 'Else';

Then
 : 'THEN'
 | 'then'
 | 'Then'
 ;

In : 'IN' | 'in';

Do : 'do' | 'Do';

End : 'end' | 'END';

Value
    : Int
    | DOUBLE
    | '-'DOUBLE
    | '-'Int
    | SignedFloat
    | 'null';

Int : [0-9]+;

SignedFloat
    : UnaryOperator? UnsignedFloat
    ;

MUL : '*' ; // assigns token name to '*' used above in grammar
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;

DOUBLE
    : Int Pt Int
    | Pt Int
    | Int
    ;

Pt  : '.';

UnaryOperator
 :    '+'
 |    '-'
 ;

UnsignedFloat
 :   ('0'..'9')+ '.' ('0'..'9')* Exponent?
 |   '.' ('0'..'9')+ Exponent?
 |   ('0'..'9')+ Exponent
 ;

Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

Comma : ',';

Quote
 : '\''
 | '"'
 ;

Identifier  : [a-zA-Z_] [a-zA-Z_0-9]*  ;
WS : ( ' ' | '\t' | '\r' | '\n' )-> channel(HIDDEN);
Ajay Sant
  • 665
  • 1
  • 11
  • 21