I'm trying to implement a rule along the lines of "all characters in the Letter and Symbol Unicode categories except a few reserved characters." From the lexer rules, I know I can use \p{___}
to match against Unicode categories, but I am unsure of how to handle excluding certain characters.
Looking at example grammars, I am led a few different directions. For example, the Java 9 grammar seems to use predicates in order to directly use Java's built in isJavaIdentifier()
while others manually define every valid character.
How can I achieve this functionality?