lexer error-handling PLY Python

Question

The t_error() function is used to handle lexing errors that occur when illegal characters are detected. My question is: How can I use this function to get more specific information on errors? Like error type, in which rule or section the error appears, etc.

score 3 · Answer 1 · answered Nov 28 '11 at 03:45

In general, there is only very limited information available to the t_error() function. As input, it receives a token object where the value has been set to the remaining input text. Analysis of that text is entirely up to you. You can use the t.lexer.skip(n) function to have the lexer skip ahead by a certain number of characters and that's about it.

There is no notion of an "error type" other than the fact that there is an input character that does not match the regular expression of any known token. Since the lexer is decoupled from the parser, there is no direct way to get any information about the state of the parsing engine or to find out what grammar rule is being parsed. Even if you could get the state (which would simply be the underlying state number of the LALR state machine), interpretation of it would likely be very difficult since the parser could be in the intermediate stages of matching dozens of possible grammar rules looking for reduce actions.

My advice is as follows: If you need additional information in the t_error() function, you should set up some kind of object that is shared between the lexer and parser components of your code. You should explicitly make different parts of your compiler update that object as needed (e.g., it could be updated in specific grammar rules).

Just as aside, there are usually very few courses of action for a bad token. Essentially, you're getting input text that doesn't any known part of the language alphabet (e.g., no known symbol). As such, there's not even any kind of token value you can give to the parser. Usually, the only course of action is to report the bad input, throw it out, and continue.

As a followup to Raymond's answer, I would also not advise modifying any attribute of the lexer object in t_error().

Raymond Hettinger · Answer 2 · 2011-11-27T07:29:05.053

Ply includes an example ANSI-C style lexer in a file called cpp.py. It has an example of how to extract some information out of t_error():

def t_error(t):
    t.type = t.value[0]
    t.value = t.value[0]
    t.lexer.skip(1)
    return t

In that function, you can also access the lexer's public attributes:

lineno - Current line number
lexpos - Current position in the input string

There are also some other attributes that aren't listed as public but may provide some useful diagnostics:

lexstate - Current lexer state
lexstatestack - Stack of lexer states
lexstateinfo - State information
lexerrorf - Error rule (if any)

score 1 · Answer 3 · answered Mar 09 '12 at 10:42

1

There is indeed a way of managing errors in PLY, take a look at this very interesting resentation:

http://www.slideshare.net/dabeaz/writing-parsers-and-compilers-with-ply

and at chapter 6.8.1 of

http://www.dabeaz.com/ply/ply.html#ply_nn3

answered Mar 09 '12 at 10:42

nios

131
1
1
7

lexer error-handling PLY Python

3 Answers3