I want to define a lexer rule for ranges between unicode characters that have code points that need more than four hexadecimal digits to identify. To be concrete, I want to declare the following rule:
ID_Continue : [\uE0100-\uE01EF] ;
Unfortunately, it doesn't work. This rule will match characters that are not in this range. (I'm not certain to what exact behaviour this results in, but it isn't the one I want.) I've tried also the following (padding with leading zeros and using 8 digits):
ID_Continue : [\U000E0100-\U000E01EF] ;
But it seems to result in the same unwanted behaviour.
I am using Antlr4 and the IntelliJ plugin for it for testing.
Does Antlr4 not support unicode literals above \uFFFF
?