I am using antlr 2.7.4 to create a lexer. I am stuck at the following case:
if Colon (':') is followed by characters from class C1, COLON token should be emitted followed by token C1
if Colon is followed by Character from character class C2, colon should be taken as part of C2 and token C2 should be emitted.
Assuming Class C1 is {1,2,3} and Class C2 is {A,B,C} then :13 should tokenized as COLON followed by C1. However :AB should be tokenized as C2.
More concretely, I have grammar for a language which has two constructs:
- Identifier : Type // It has three tokens: IDENT COLON IDENT . Pascall like type annotation
- :: // this is an identifier. There is a class of character that can be used as identifiers. Colon can be used as identifier provided it is used with other characters of the class
Some examples:
- myvar : Int // IDENT COLON IDENT
- :: // IDENT
- :$ // IDENT
- &:& // IDENT
A followup question: Is it possible to check if a certain look ahead character belongs to certain Token class ? Any suggestions will be really appreciated.
EDIT
I guess I am the only user of antlrv2. I will be happy to have a solution in antlrv3 and see if I could hack it in antlrv2.