I'm writing a compiler in Python, and I made a hand-written lexer, because I can't figure out how to parse indentation in PLY. Also, my lexer uses some yield
statements like so:
def scan():
...
for i in tokens:
if i[0]: yield Token(self.line, i[0] if i[0] in keywords else "ident", i[0])
elif i[1]:
if "e" in i[1]:
base, exp = i[1].split("e")
val = float(base) * 10 ** int(exp)
else: val = float(i[1])
yield Token(self.line, "float", val)
... other cases ...
However, I realized that the PLY parser requires a token
method, so I made one that looks like this:
def token(self):
return next(self.scan())
The actual scanning using scan()
takes an average of 124 ms, according to my tests, but when I use the PLY parser, the parsing doesn't start after a few minutes. It appears that my token()
method has a problem.
Also, I tried to rename the scan()
method so that it could become the interface. Python returns something like
AttributeError: 'generator' object has no attribute 'type'
So it appears that PLY needs a method that will return a single token at a time.
Is there any way to rewrite the token()
method so that it would return the next iteration of scan()
and not be that slow?