1

I am using this javaparser https://github.com/javaparser/javaparser to parse a lot of java source codes of some github users to make some statistics from them (this is for a university project). Everything seems to work fine, but at some point, a particular source code produces this error:

Exception in thread "main" com.github.javaparser.TokenMgrError: Lexical error at line 6, column 2.  Encountered: <EOF> after : ""

This is what is written in that file:

public class Test {
    /**<caret>
    public void foo() {
    }
}

This is how I parse the file:

...

new NodeIterator(new NodeIterator.NodeHandler() {
    @Override
    public boolean handle(Node node) {
        ...
    };
}).explore(JavaParser.parse(file));

...

This is the NodeIterator class:

public class NodeIterator {
    public interface NodeHandler {
        boolean handle(Node node);
    }

    private NodeHandler nodeHandler;

    public NodeIterator(NodeHandler nodeHandler) {
        this.nodeHandler = nodeHandler;
    }

    public void explore(Node node) {
        if (nodeHandler.handle(node)) {
            for (Node child : node.getChildrenNodes()) {
                explore(child);
            }
        }
    }
}

I have understood the problem, but this problem stops the entire parsing. I have a lot of files to parse inside a for, so how can I do to keep parsing the other files? Or is there a tool to check if a java file is "well written" before parsing it?

1 Answers1

2

You can't solve "the problem" because it is not a problem. The error is correct, since the source code that you are trying to parse is incorrect. It has a comment that is not terminated before the end of the file.

If you compile the same source code with javac you also get an error. It's more detailed than in your javaparser, but it's still also an error, because the source that you are trying to parse has this error.

Javac output:

Test.java:2: error: unclosed comment
    /**<caret>
    ^
Test.java:6: error: reached end of file while parsing
2 errors
Erwin Bolwidt
  • 30,799
  • 15
  • 56
  • 79
  • So, is there a way to check if the file I am parsing is well written or skip the error and continuing to parse? –  Jul 21 '16 at 14:50
  • It doesn't look like this library has recovery capabilities after a token parsing error, and in any case hitting the end of file while tokenizing is hard to recover from. Your best bet is to either ignore the rest of the file or the complete file. – Erwin Bolwidt Jul 21 '16 at 15:20
  • My problem is that I am parsing a lot of files and each of these errors block the entire parsing, so do you know how to keep parsing the rest of the files, or maybe a tool to check if a file is "well written" or not? –  Jul 24 '16 at 11:46