The t_error() function is used to handle lexing errors that occur when illegal characters are detected. My question is: How can I use this function to get more specific information on errors? Like error type, in which rule or section the error appears, etc.
3 Answers
In general, there is only very limited information available to the t_error() function. As input, it receives a token object where the value has been set to the remaining input text. Analysis of that text is entirely up to you. You can use the t.lexer.skip(n) function to have the lexer skip ahead by a certain number of characters and that's about it.
There is no notion of an "error type" other than the fact that there is an input character that does not match the regular expression of any known token. Since the lexer is decoupled from the parser, there is no direct way to get any information about the state of the parsing engine or to find out what grammar rule is being parsed. Even if you could get the state (which would simply be the underlying state number of the LALR state machine), interpretation of it would likely be very difficult since the parser could be in the intermediate stages of matching dozens of possible grammar rules looking for reduce actions.
My advice is as follows: If you need additional information in the t_error() function, you should set up some kind of object that is shared between the lexer and parser components of your code. You should explicitly make different parts of your compiler update that object as needed (e.g., it could be updated in specific grammar rules).
Just as aside, there are usually very few courses of action for a bad token. Essentially, you're getting input text that doesn't any known part of the language alphabet (e.g., no known symbol). As such, there's not even any kind of token value you can give to the parser. Usually, the only course of action is to report the bad input, throw it out, and continue.
As a followup to Raymond's answer, I would also not advise modifying any attribute of the lexer object in t_error().

- 416
- 2
- 3
Ply includes an example ANSI-C style lexer in a file called cpp.py. It has an example of how to extract some information out of t_error():
def t_error(t):
t.type = t.value[0]
t.value = t.value[0]
t.lexer.skip(1)
return t
In that function, you can also access the lexer's public attributes:
- lineno - Current line number
- lexpos - Current position in the input string
There are also some other attributes that aren't listed as public but may provide some useful diagnostics:
- lexstate - Current lexer state
- lexstatestack - Stack of lexer states
- lexstateinfo - State information
- lexerrorf - Error rule (if any)

- 216,523
- 63
- 388
- 485
There is indeed a way of managing errors in PLY, take a look at this very interesting resentation:
http://www.slideshare.net/dabeaz/writing-parsers-and-compilers-with-ply
and at chapter 6.8.1 of

- 131
- 1
- 1
- 7