antlr4, trivial grammar, token recognition errors

Question

As a complete beginner in antlr4, I haven't been able to make any use of the answer to a similar question. It looks to me that fragments are only called in my grammar by terminal rules, but still the parser is throwing the following error when submitted the string "myIdentifier":

line 1:0 token recognition error at: 'm'
line 1:1 token recognition error at: 'y'
line 1:2 token recognition error at: 'I'
line 1:3 token recognition error at: 'd'
line 1:4 token recognition error at: 'e'
line 1:5 token recognition error at: 'n'
line 1:6 token recognition error at: 't'
line 1:7 token recognition error at: 'i'
line 1:8 token recognition error at: 'f'
line 1:9 token recognition error at: 'i'
line 1:10 token recognition error at: 'e'
line 1:11 token recognition error at: 'r'

My grammar is this:

grammar Sable;

options {

}

@header {
    package org.sable.parser.gen;
}

IDENTIFIER:
    (IdentifierHead IdentifierCharacter*)
    | ('`'(IdentifierHead IdentifierCharacter*)'`')
    ;

WS  :  [ \u0020\u000C\u000A\u000D\u0009u000B\u000C]+ -> skip
    ;

COMMENT
    :   '/*' .*? '*/' -> channel(HIDDEN)
    ;

LINE_COMMENT
    :   '//' ~[\u000A\u000D]* -> channel(HIDDEN)
    ;




// NOTE: a file with zero statements is allowed because
// it can contain just comments.
sourceFile:
    statement* EOF;

statement:
    expression ';'?;

// Req. not existing any valid expression starting from
// an equals sign or any other assignment operator.
expression:
    valuedExpression (assignmentOperator valuedExpression)?;

valuedExpression:
    IDENTIFIER
    ;

assignmentOperator:
    '='
    | '*='
    | '/='
    | '%='
    | '+='
    | '-='
    | '<<='
    | '>>='
    | '&='
    | '^='
    | '|='
    ;

fragment DecimalDigit:
    '0'..'9'
    ;

fragment IdentifierHead:
    'a'..'z'
    | 'A'..'Z'
    | '_'
    | '\u00A8'
    | '\u00AA'
    | '\u00AD'
    | '\u00AF' |
    '\u00B2'..'\u00B5' |
    '\u00B7'..'\u00BA'  |
    '\u00BC'..'\u00BE' |
    '\u00C0'..'\u00D6' |
    '\u00D8'..'\u00F6' |
    '\u00F8'..'\u00FF' |
    '\u0100'..'\u02FF' |
    '\u0370'..'\u167F' |
    '\u1681'..'\u180D' |
    '\u180F'..'\u1DBF' |
    '\u1E00'..'\u1FFF' |
    '\u200B'..'\u200D' |
    '\u202A'..'\u202E' |
    '\u203F'..'\u2040' |
    '\u2054' |
    '\u2060'..'\u206F' |
    '\u2070'..'\u20CF' |
    '\u2100'..'\u218F' |
    '\u2460'..'\u24FF' |
    '\u2776'..'\u2793' |
    '\u2C00'..'\u2DFF' |
    '\u2E80'..'\u2FFF' |
    '\u3004'..'\u3007' |
    '\u3021'..'\u302F' |
    '\u3031'..'\u303F' |
    '\u3040'..'\uD7FF' |
    '\uF900'..'\uFD3D' |
    '\uFD40'..'\uFDCF' |
    '\uFDF0'..'\uFE1F' |
    '\uFE30'..'\uFE44' |
    '\uFE47'..'\uFFFD'
    ;
fragment IdentifierCharacter:
    DecimalDigit
    | '\u0300'..'\u036F'
    | '\u1DC0'..'\u1DFF'
    | '\u20D0'..'\u20FF'
    | '\uFE20'..'\uFE2F'
    | IdentifierHead
    ;

What am I doing wrongly? My assumptions are:

IDENTIFIER is a terminal
IdentifierHead and IdentifierCharacter are fragments
The rest are all parse rules.

I've changed not the rules completely for a part of a tested grammar I found online, and I'm having the same errors for similar test strings. Since the only part common to both wrong cases are the options and the @header directives, I guess the error might be caused by them. — AmazingWouldBeGreatBut, Aug 31 '17 at 15:33
When I test your grammar, `"myIdentifier"` is tokenised as a `IDENTIFIER`. Perhaps you need to regenerate your lexer/parser? With and without the empty `options { }` block it works just fine. — Bart Kiers, Aug 31 '17 at 15:33
Hi Bart. Thanks a lot for your input. Well, that makes some sense, at least to the extent that it's my environment what is wrong, not my grammar(s). I'm using Certiv's Eclipse embedded generator. I will try to do it via a stand-alone installation of antlr4 and see how it goes. — AmazingWouldBeGreatBut, Aug 31 '17 at 15:35
Well, that was it - I don't know why, but the classes generated by Certiv's plugin just make the test fail for the same grammar. Thanks again for your comment - it pushed me in the right direction. There is no top-level answer. Shall I do something to close this question taking your comment as the answer? — AmazingWouldBeGreatBut, Aug 31 '17 at 15:50

score 1 · Accepted Answer · answered Aug 31 '17 at 16:07

On the basis of Bart Kiers' comment:

When I test your grammar, "myIdentifier" is tokenised as a IDENTIFIER. Perhaps you need to regenerate your lexer/parser? With and without the empty options { } block it works just fine.

it turned out that the problem was in my environment rather than in my grammar. I was using Certiv's antlr4 support plugins for Eclipse to generate my grammar. As soon as I began to generate my grammar using antlr4 from the command line, the errors disappeared.

antlr4, trivial grammar, token recognition errors

1 Answers1