How to write a PLY interface for hand-written lexer?

Question

I'm writing a compiler in Python, and I made a hand-written lexer, because I can't figure out how to parse indentation in PLY. Also, my lexer uses some yield statements like so:

def scan():
...
    for i in tokens:
        if i[0]: yield Token(self.line, i[0] if i[0] in keywords else "ident", i[0])
            elif i[1]:
                 if "e" in i[1]:
                     base, exp = i[1].split("e")
                     val = float(base) * 10 ** int(exp)
                 else: val = float(i[1])
                 yield Token(self.line, "float", val)
        ... other cases ...

However, I realized that the PLY parser requires a token method, so I made one that looks like this:

def token(self):
    return next(self.scan())

The actual scanning using scan() takes an average of 124 ms, according to my tests, but when I use the PLY parser, the parsing doesn't start after a few minutes. It appears that my token() method has a problem.

Also, I tried to rename the scan() method so that it could become the interface. Python returns something like

AttributeError: 'generator' object has no attribute 'type'

So it appears that PLY needs a method that will return a single token at a time.

Is there any way to rewrite the token() method so that it would return the next iteration of scan() and not be that slow?

By the way, you can use PLY for most lexing even if you want to parse indentation. Just carry all (leading?) whitespace newlines through the lexer and postprocess the token stream to insert proper INDENT/DEDENT tokens. When I did this, I didn't use PLY for the parsing step, so I didn't have to integrate it and hence can't answer this question. But it does make writing the lexer easier IMHO. — , Mar 31 '12 at 17:28
@delnan That could _probably_ work, but what I also can't figure out is how to return multiple tokens for that one rule to process indentation. The rest I could code already. — Sammi De Guzman, Mar 31 '12 at 17:47

score 1 · Accepted Answer · answered Mar 31 '12 at 17:07

1

You need to save your generator somewhere, like:

def start(...):
   self.lexer = self.scan()

def token(...):
    return next(self.lexer)

Disclaimer: I don't know anything about PLY.

answered Mar 31 '12 at 17:07

georg

211,518
52
313
390

It works, but it definitely makes the program go quite a bit slower, about 5ms. But that doesn't really bother me. Thanks! – Sammi De Guzman Mar 31 '12 at 21:54

How to write a PLY interface for hand-written lexer?

1 Answers1