In my ANTLr code, we should be able to recognize strings, characters, hexadecimal numbers etc.
However, in my code, when I test it like this:
grun A1_lexer tokens -tokens test.txt
With my test.txt file being a simple string, such as "pineapple", it is unable to recognize the different tokens.
In my lexer, I define the following helper tokens:
fragment Delimiter: ' ' | '\t' | '\n' ;
fragment Alpha: [a-zA-Z_];
fragment Char: ['a'-'z'] | ['A' - 'Z'] | ['0' - '9'] ;
fragment Digit: ['0'-'9'] ;
fragment Alpha_num: Alpha | Digit ;
fragment Single_quote: '\'' ;
fragment Double_quote: '\"' ;
fragment Hex_digit: Digit | [a-fA-F] ;
And I define the following tokens:
Char_literal : (Single_quote)Char(Single_quote) ;
String_literal : (Double_quote)Char*(Double_quote) ;
Id: Alpha Alpha_num* ;
I run it like this:
grun A1_lexer tokens -tokens test.txt
And it outputs this:
line 1:0 token recognition error at: '"'
line 1:1 token recognition error at: 'p'
line 1:2 token recognition error at: 'ine'
line 1:6 token recognition error at: 'p'
line 1:7 token recognition error at: 'p'
line 1:8 token recognition error at: 'l'
line 1:9 token recognition error at: 'e"'
[@0,5:5='a',<Id>,1:5]
[@1,12:11='<EOF>',<EOF>,2:0]
I am really wondering what the problem is and how I could fix it. Thanks.
UPDATE 1:
fragment Delimiter: ' ' | '\t' | '\n' ;
fragment Alpha: [a-zA-Z_];
fragment Char: [a-zA-Z0-9] ;
fragment Digit: [0-9] ;
fragment Alpha_num: Alpha | Digit ;
fragment Single_quote: '\'' ;
fragment Double_quote: '\"' ;
I have updated the code, I got rid of the un-necessary single quotes in my Char classification. However, I get the same output as before.
UPDATE 2:
Even when I make the changes suggested, I still get the same error. I believed the problem is that I am not recompiling, but I am. These are the steps that I take to recompile.
antlr4 A1_lexer.g4
javac A1_lexer*.java
chmod a+x build.sh
./build.sh
grun A1_lexer tokens -tokens test.txt
With my build.sh file looking like this:
#!/bin/bash
FILE="A1_lexer"
ANTLR=$(echo $CLASSPATH | tr ':' '\n' | grep -m 1 "antlr-4.7.1-
complete.jar")
java -jar $ANTLR $FILE.g4
javac $FILE*.java
Even when I recompile, my antlr code is still unable to recognize the tokens.
My code is also now like this:
fragment Delimiter: ' ' | '\t' | '\n' ;
fragment Alpha: [a-zA-Z_];
fragment Char: [a-zA-Z0-9] ;
fragment Digit: [0-9] ;
fragment Alpha_num: Alpha | Digit ;
fragment Single_quote: '\'' ;
fragment Double_quote: '"' ;
fragment Hex_digit: Digit | [a-fA-F] ;
fragment Eq_op: '==' | '!=' ;
Char_literal : (Single_quote)Char(Single_quote) ;
String_literal : (Double_quote)Char*(Double_quote) ;
Decimal_literal : Digit+ ;
Id: Alpha Alpha_num* ;
UPDATE 3:
Grammar:
program
:'class Program {'field_decl* method_decl*'}'
field_decl
: type (id | id'['int_literal']') ( ',' id | id'['int_literal']')*';'
| type id '=' literal ';'
method_decl
: (type | 'void') id'('( (type id) ( ','type id)*)? ')'block
block
: '{'var_decl* statement*'}'
var_decl
: type id(','id)* ';'
type
: 'int'
| 'boolean'
statement
: location assign_op expr';'
| method_call';'
| 'if ('expr')' block ('else' block )?
| 'switch' expr '{'('case' literal ':' statement*)+'}'
| 'while (' expr ')' statement
| 'return' ( expr )? ';'
| 'break ;'
| 'continue ;'
| block
assign_op
: '='
| '+='
| '-='
method_call
: method_name '(' (expr ( ',' expr )*)? ')'
| 'callout (' string_literal ( ',' callout_arg )* ')'
method_name
: id
location
: id
| id '[' expr ']'
expr
: location
| method_call
| literal
| expr bin_op expr
| '-' expr
| '!' expr
| '(' expr ')'
callout_arg
: expr
| string_literal
bin_op
: arith_op
| rel_op
| eq_op
| cond_op
arith_op
: '+'
| '-'
| '*'
| '/'
| '%'
rel_op
: '<'
| '>'
| '<='
| '>='
eq_op
: '=='
| '!='
cond_op
: '&&'
| '||'
literal
: int_literal
| char_literal
| bool_literal
id
: alpha alpha_num*
alpha
: ['a'-'z''A'-'Z''_']
alpha_num
: alpha
| digit
digit
: ['0'-'9']
hex_digit
: digit
| ['a'-'f''A'-'F']
int_literal
: decimal_literal
| hex_literal
decimal_literal
: digit+
hex_literal
: '0x' hex_digit+
bool_literal
: 'true'
| 'false'
char_literal
: '‘'char'’'
string_literal
: '“'char*'”'
test.txt :
"pineapple"
A1_lexer:
fragment Delimiter: ' ' | '\t' | '\n' ;
fragment Alpha: [a-zA-Z_];
fragment Char: [a-zA-Z0-9] ;
fragment Digit: [0-9] ;
fragment Alpha_num: Alpha | Digit ;
fragment Single_quote: '\'' ;
fragment Double_quote: '"' ;
fragment Hex_digit: Digit | [a-fA-F] ;
fragment Eq_op: '==' | '!=' ;
Char_literal : (Single_quote)Char(Single_quote) ;
String_literal : (Double_quote)Char*(Double_quote) ;
Decimal_literal : Digit+ ;
Id: Alpha Alpha_num* ;
What I Write in Terminal:
grun A1_lexer tokens -tokens test.txt
Output in Terminal:
line 1:0 token recognition error at: '"'
line 1:1 token recognition error at: 'p'
line 1:2 token recognition error at: 'ine'
line 1:6 token recognition error at: 'p'
line 1:7 token recognition error at: 'p'
line 1:8 token recognition error at: 'l'
line 1:9 token recognition error at: 'e"'
[@0,5:5='a',<Id>,1:5]
[@1,12:11='<EOF>',<EOF>,2:0]
I am really not sure why this is happening.