0

I am attempting to understand how ANTLR4 handles errors in a Python environment. My final code needs to detect and report any data in the file that is not valid regardless of where it appears. As part of this effort I am using the examples in the py3antlr4book to try some basic scenarios. Specifically, I used the example in the 01-Hello directory and tried two different input files with bogus entries added:

Hello.g4

grammar Hello;            // Define a grammar called Hello
r  : 'hello' ID ;         // match keyword hello followed by an identifier
ID : [a-z]+ ;             // match lower-case identifiers
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines, \r (Windows)

bogus_first.txt

bogus
hello world

Output

line 1:0 extraneous input 'bogus' expecting 'hello'
(r bogus hello world)

bogus_last.txt

hello world
bogus

Output

(r hello world)

The output from bogus_first.txt makes a lot of sense to me. It errored, and it indicated where the error is. The output from bogus_last.txt didn't error and didn't indicate there was some sort of bad input in the data. This is surprising to me at least. I tried using this article's suggestion of adding an ErrorListener, but that didn't seem to catch the bogus entry. I also tried adding an ErrorStrategy, but that didn't seem to catch the bogus entry either.

Below is the code I used to implement the ErrorListener and ErrorStrategy. The inErrorRecoveryMode didn't seem to be at the line I wanted, but I am not really sure if I am just printing out the correct data or not.

What do I need to change about my testbench in order to be able to error on something like the example bogus_last.txt?

test_hello.py

import sys
from antlr4 import *
from HelloLexer import HelloLexer
from HelloParser import HelloParser
from antlr4.error.ErrorListener import ErrorListener
from antlr4.error.ErrorStrategy import DefaultErrorStrategy

class MyErrorListener( ErrorListener ):

    def __init__(self):
        super().__init__()

    def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):
        raise Exception("Oh no!!")

    def reportAmbiguity(self, recognizer, dfa, startIndex, stopIndex, exact, ambigAlts, configs):
        raise Exception("Oh no!!")

    def reportAttemptingFullContext(self, recognizer, dfa, startIndex, stopIndex, conflictingAlts, configs):
        raise Exception("Oh no!!")

    def reportContextSensitivity(self, recognizer, dfa, startIndex, stopIndex, prediction, configs):
        raise Exception("Oh no!!")

class MyErrorStrategy(DefaultErrorStrategy):

    def __init__(self):
        super().__init__()

    def reset(self, parser):
        raise Exception("Oh no!!")

    def recoverInline(self, parser):
        raise Exception("Oh no!!")

    def recover(self, parser, excp):
        raise Exception("Oh no!!")

    def sync(self, parser):
        raise Exception("Oh no!!")

    def inErrorRecoveryMode(self, parser):
        ctx = parser._ctx
        print(self.lastErrorIndex)
        return super().inErrorRecoveryMode(parser)

    def reportError(self, parser, excp):
        raise Exception("Oh no!!")


def main(argv):
    input = FileStream(argv[1])
    lexer = HelloLexer(input)
    stream = CommonTokenStream(lexer)
    parser = HelloParser(stream)
    parser.addErrorListener( MyErrorListener() )
    parser._errHandler = MyErrorStrategy()
    tree = parser.r()
    print(tree.toStringTree(recog=parser))

if __name__ == '__main__':
    main(sys.argv)
ech3
  • 3
  • 3

1 Answers1

0

The fact that:

hello world
bogus

didn't produce an error is because the parser successfully parses hello world using the production r : 'hello' ID ; and then stops. You did not tell the parser to consume all the tokens from the token stream. If you want to force the parser to do that, add the EOF token to the end of your rule:

r  : 'hello' ID EOF;

Then input like:

hello world
bogus

will produce an error. But this error is only printed to your stderr stream and the parser tries to recover and continue parsing. To let it fail, do something like this:

import antlr4
from antlr4.error.ErrorListener import ErrorListener

from HelloLexer import HelloLexer
from HelloParser import HelloParser


class BailOnErrorListener(ErrorListener):
    def syntaxError(self, recognizer, offending_symbol, line: int, column: int, msg, error):
        raise RuntimeError(f'msg: {msg}')


def main(src):
    lexer = HelloLexer(antlr4.InputStream(src))
    parser = HelloParser(antlr4.CommonTokenStream(lexer))
    parser.removeErrorListeners()
    parser.addErrorListener(BailOnErrorListener())
    tree = parser.r()
    print(tree.toStringTree(recog=parser))


if __name__ == '__main__':
    src = "hello world\nbogus"
    main(src)

and then invoking parser.r() will fail:

Traceback (most recent call last):
  ...
RuntimeError: msg: extraneous input 'bogus' expecting <EOF>
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288