3

I apologize in advance if this question has already been asked, can't seem to find it.

I'm just beginning with Antlr, using the antlr4IDE for Eclipse to create a parser for a small subset of Java. For some reason, unless I explicitly state the presence of a white space in my regex, the parser will throw an error.

My grammar:

grammar Hello;


r  : 
    (Statement ';')+  
    ;         


Statement: 
    DECL | INIT 
    ;

DECL: 
    'int' ID 
    ; 

INIT: 
    DECL '=' NUMEXPR 
    ;

NUMEXPR : 
    Number OP Number | Number 
    ;

OP : 
      '+' 
    | '-' 
    | '/' 
    | '*' 
    ; 

WS  :  
    [ \t\r\n\u000C]+ -> skip
    ;

Number: 
    [0-9]+ 
    ;

ID : 
    [a-zA-Z]+ 
    ; 

When trying to parse

    int hello = 76;  

I receive the error:

 Hello::r:1:0: mismatched input 'int' expecting Statement
 Hello::r:1:10: token recognition error at: '='

However, when I manually add the token WS into the rules, I receive no error.

Any ideas where I'm going wrong? I'm new to Antlr, so I'm probably making a stupid mistake. Thanks in advance.

EDIT : Here is my parse tree and error log:

Parse Tree

Error Log:

Error Log

Slavvio
  • 45
  • 4

2 Answers2

1

Change syntax like this.

grammar Hello;
r         : (statement ';')+ ;         
statement : decl | init ;
decl      : 'int' ID  ; 
init      : decl '=' numexpr ;
numexpr   : Number op Number | Number ;
op        : '+' | '-' | '/' | '*' ; 
WS        : [ \t\r\n\u000C]+ -> skip ;
Number    : [0-9]+ ;
ID        : [a-zA-Z]+ ; 

enter image description here

  • I'm still having the same error, even after copying exactly what you wrote. I posted the parse tree in the original post – Slavvio Mar 20 '17 at 15:07
  • Nevermind, it worked, thank you! Follow up question: do you happen to know why capitalizing the symbols made a difference? – Slavvio Mar 20 '17 at 15:14
  • See [Grammar Lexicon](https://github.com/antlr/antlr4/blob/4.6/doc/lexicon.md). Token names always start with a capital letter. Parser rule names always start with a lowercase letter. –  Mar 20 '17 at 22:02
0

After looking at the documentation on antlr4, it seems like you have to have a specification for all of the character combinations that you expect to see in your file, from start to finish - not just those that you want to handle.

In that regards, it's expected that you would have to explicitly state the whitespace, with something like:

WS : [ \t\r\n]+ -> skip;

That's why the skip command exists:

A 'skip' command tells the lexer to get another token and throw out the current text.

Though note that sometimes this can cause a little trouble such as in this post.

Community
  • 1
  • 1
Addison
  • 7,322
  • 2
  • 39
  • 55