For a parser I am creating, I use this regular expression as the definition of an ID:
ID: /[a-z_][a-z0-9]*/i
(For anyone who is not familiar with the syntax of the particular parser I'm using, the "i" flag simply means case-insensitive.)
I also have a number of keywords, like this:
CALL_KW: "call"
PRINT_KW: "print"
The problem is, due to some ambiguities in the grammar, sometimes keywords are treated as ID's, while I really don't want them to be. So I was thinking whether I could rewrite the regular expression for ID in such a way that keywords are not matched against it at all. Is such a thing possible?
To give some more context, I'm using the Lark parser library for Python. The Earley parser Lark provides (together with the dynamic lexer) are quite flexible and powerful in treating ambiguous grammars, but they sometimes do weird things like this (and non-deterministically, at that!). So I'm trying to give the parser some help here, by making keywords never matching an ID rule.