So I've implemented a parser using PLY — but all the PLY documentation deals with parse and tokenization errors by printing out error messages. I'm wondering what the best way to implement non-fatal error-reporting is, at an API level, to the caller of the parser. Obviously the "non-fatal" restriction means exceptions are out — and it feels like I'd be misusing the warnings
module for parse errors. Suggestions?

- 5,532
- 2
- 30
- 41
2 Answers
PLY has a t_error() function that you can override in your parser to do whatever you want. The example provided in the documentation prints out an error message and skips the offending character - but you could just as easily update a list of encountered parsing failures, have a threshold that stops after X amount of failures, etc. - http://www.dabeaz.com/ply/ply.html
4.9 Error handling
Finally, the t_error() function is used to handle lexing errors that occur when illegal characters are detected. In this case, the t.value attribute contains the rest of the input string that has not been tokenized. In the example, the error function was defined as follows:
# Error handling rule
def t_error(t):
print "Illegal character '%s'" % t.value[0]
t.lexer.skip(1)
You can utilize this by making your parser a class and storing error state within it - this is a very crude example since you'd have to make multiple MyLexer instances, then build() them, then utilize them for parsing if you wanted multiple lexers running concurrently.
You could marry the error storage to the __hash__
of the lexer instance itself to only have to build once. I'm hazy on the details of running multiple lexer instances within one class, but really this is just to give a rough example of how you can capture and report non-fatal errors.
I've modified the simple calculator class example from Ply's documentation for this purpose.
#!/usr/bin/python
import ply.lex as lex
class MyLexer:
errors = []
# List of token names. This is always required
tokens = (
'NUMBER',
'PLUS',
'MINUS',
'TIMES',
'DIVIDE',
'LPAREN',
'RPAREN',
)
# Regular expression rules for simple tokens
t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
t_LPAREN = r'\('
t_RPAREN = r'\)'
# A regular expression rule with some action code
# Note addition of self parameter since we're in a class
def t_NUMBER(self,t):
r'\d+'
t.value = int(t.value)
return t
# Define a rule so we can track line numbers
def t_newline(self,t):
r'\n+'
t.lexer.lineno += len(t.value)
# A string containing ignored characters (spaces and tabs)
t_ignore = ' \t'
# Error handling rule
def t_error(self,t):
self.errors.append("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
def build(self,**kwargs):
self.errors = []
self.lexer = lex.lex(module=self, **kwargs)
# Test it output
def test(self,data):
self.errors = []
self.lexer.input(data)
while True:
tok = self.lexer.token()
if not tok: break
print tok
def report(self):
return self.errors
Usage:
# Build the lexer and try it out
m = MyLexer()
m.build() # Build the lexer
m.test("3 + 4 + 5") # Test it
print m.report()
m.test("3 + A + B")
print m.report()
Output:
LexToken(NUMBER,3,1,0)
LexToken(PLUS,'+',1,2)
LexToken(NUMBER,4,1,4)
LexToken(PLUS,'+',1,6)
LexToken(NUMBER,5,1,8)
[]
LexToken(NUMBER,3,1,0)
LexToken(PLUS,'+',1,2)
LexToken(PLUS,'+',1,6)
["Illegal character 'A'", "Illegal character 'B'"]

- 27,321
- 5
- 74
- 91
-
See the title of the question. I was asking how to implement error-reporting to the *caller of the parser*. Sure, `t_error`, `p_error` allow me to get the errors *within* the parser, but how do I *then* report the error back to the caller of the parser? Where would I put such a list, such that multiple parsers can exist concurrently without them interfering? The mechanics of getting the errors isn't the difficult part. – gsnedders Aug 05 '13 at 09:58
-
Updated my answer to paint the picture more vividly. – synthesizerpatel Aug 05 '13 at 13:59
Check out section 9.2:
9.2 Run-time Debugging
To enable run-time debugging of a parser, use the
debug
option to parse. This option can either be an integer (which simply turns debugging on or off) or an instance of a logger object. For example:log = logging.getLogger() parser.parse(input,debug=log)
If a logging object is passed, you can use its filtering level to control how much output gets generated. The
INFO
level is used to produce information about rule reductions. TheDEBUG
level will show information about the parsing stack, token shifts, and other details. TheERROR
level shows information related to parsing errors.
The logging
module is part of CPython's standard library.

- 10,524
- 11
- 77
- 109