Antlr4 token ambiguity for single character

Question

I have a problem with the rule mnemonic_format.

Instead to recognize a simple text like A100 it gives the following error :

mismatched input 'A100' expecting 'A'

The grammar is:

grammar SimpleMathGrammar;

INTEGER     : [0-9]+;
FLOAT       : [0-9]+ '.' [0-9]+;

ADD         : '+';
SUB         : '-';
DOT         : '.';

AND     : 'AND';

BACKSLASH   : '\\';

fragment SINGLELETTER   :   ( 'a'..'z' | 'A'..'Z');


fragment LOWERCASE  :   'a'..'z';
fragment UNDERSCORE :   '_';
fragment DOLLAR     :   '$';
fragment NUMBER     :   '0'..'9';

VARIABLENAME
    :   SINGLELETTER
    |   (SINGLELETTER|UNDERSCORE) (SINGLELETTER | UNDERSCORE | DOLLAR | NUMBER)*;

HASH    : '#';

/* PARSER */

operation
        : (INTEGER | FLOAT) ADD (INTEGER | FLOAT)
        | (INTEGER | FLOAT) SUB (INTEGER | FLOAT);

operation_with_backslash    : BACKSLASH operation BACKSLASH;

mnemonic: HASH VARIABLENAME HASH;

mnemonic_format

        // Example: A100
        : 'A' INTEGER;

At this point, i know that the token VARIABLENAME should not include the character A (correct me if im wrong)

So what can i do for include a single character (o fixed sequence) in distinct rule? (and which is my error?)

EDIT: I found the origin of the problem (by remove all of the other tokens and rules) in the following token case:

VARIABLENAME: (SINGLELETTER|UNDERSCORE) (SINGLELETTER | UNDERSCORE | DOLLAR | NUMBER)*;

So how can i create a token or a lexer rule that give me the basic for detect some generic text (like a Class name or a Variable name) by also create rules where i must accept a fixed sequence of characters?

Massimo · Accepted Answer · 2018-03-28T18:02:51.737

Ok,

The trick was the "general scope" of the token VARIABLENAME.

In other terms, the token is too much generic.

In my case the sub-condition VARIABLENAME: SINGLELETTER NUMBER* crash/collide with the condition mnemonic_format: 'A' INTEGER

(Indeed i can create the string A100 with VARIABLENAME or mnemonic_format and this create an ambiguity)

So i "specialize" VARIABLENAME for accept a prefix, for example:

VARIABLENAME
     : HASH (SINGLELETTER|UNDERSCORE)(SINGLELETTER|UNDERSCORE|DOLLAR|NUMBER)*
     | 'class ' (SINGLELETTER|UNDERSCORE)(SINGLELETTER|UNDERSCORE|DOLLAR|NUMBER)*
     ...

This should avoid an ambiguity between the token and the rule

Antlr4 token ambiguity for single character

1 Answers1