As @AustinHastings says in a comment, Ply builds the lexical scanner by combining the regular expressions supplied in the lexer class, either as the values of class members or as the docstrings of class member functions. Once the scanner is built, it will not be modified, so you really cannot dynamically adjust the regular expressions, at least not after the scanner has been generated.
For the particular application you have in mind, however, it is not necessary to create a custom regular expression. You can use the much simpler procedure illustrated in the Ply manual which shows how to recognise reserved words without a custom regular expression for each word.
The idea is really simple. The reserved words -- function names in your case -- are generally specific examples of some more general pattern already being used in the lexical scanner. That's almost certainly the case, because the lexical scanner must recognise every token in some way, so before a dynamically-generated word is added to the scanner, it must have been recognised as something else. Rather than trying to override that other pattern for the specific instance, we simply let the token be recognised and then correct its type (and possibly its value) before returning the token.
Here's a slightly modified version of the example from the Ply manual:
def t_ID(t):
r'[a-zA-Z_][a-zA-Z_0-9]*'
# Apparently case insensitive recognition is desired, so we use
# the lower-case version of the token as a lookup key. This means
# that all the keys in the dictionary must be in lower-case
token = t.value.lower()
if token in self.funcs:
t.type = 'FUNC'
return t
(You might want to adjust the above so that it does something with the value associated with the key in the funcs
dictionary, although that could just as well be done later during semantic analysis.)
Since the funcs
dictionary does not in any way participate in the generation of the lexer (or parser), no particular cleverness is needed in order to pass it into the Lexer object. Indeed, it does not even need to be in the lexer object; you could add the parser object to the lexer object when the lexer object is constructed, allowing you to put the dictionary into the parser object, where it is more accessible to parser actions.
One of the reasons that this is a much better solution than trying to build a customised regular expression is that it does not recognise reserved words which happen to be found as prefixes of non-reserved words. For example, if cos
were one of the functions, and you had managed to produce the equivalent of
t_ID = r'[a-zA-Z_][a-zA-Z_0-9]*'
def t_FUNC(t):
r'(?i)sin|cos|tan'
# do something
then you would find that:
cost = 3
was scanned as FUNC(cos), ID(t), '=', NUMBER(3)
, which is almost certainly not what you want. Putting the logic inside the t_ID
function completely avoids this problem, since only complete tokens will be considered.