1

I'm working on a project for school. We are making a static code analyzer. A requirement for this is to analyse C# code in Java, which is going so far so good with ANTLR.

I have made some example C# code to scan with ANTLR in Visual Studio. I analyse every C# file in the solution. But it does not work. I am getting a memory leak and the error message :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.antlr.runtime.Lexer.emit(Lexer.java:151)
    at org.antlr.runtime.Lexer.nextToken(Lexer.java:86)
    at org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:119)
    at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:238)

After a while I thought it was an issue with encoding, because all the files are in UTF-8. I think it can't read the encoded Stream. So i opened Notepad++ and i changed the encoding of every file to ANSI, and then it worked. I don't really understand what ANSI means, is this one character set or some kind of organisation?

I want to change the encoding from any encoding (probably UTF-8) to this ANSI encoding so i won't get memory leaks anymore.

This is the code that makes the Lexer and Parser:

InputStream inputStream = new FileInputStream(new File(filePath));
CharStream charStream = new ANTLRInputStream(inputStream);
CSharpLexer cSharpLexer = new CSharpLexer(charStream);
CommonTokenStream commonTokenStream = new CommonTokenStream(cSharpLexer);
CSharpParser cSharpParser = new CSharpParser(commonTokenStream);
  • Does anyone know how to change the encoding of the InputStream to the right encoding?
  • And what does Notepad++ do when I change the encoding to ANSI?

2 Answers2

1

When reading text files you should set the encoding explicitly. Try you examples with the following change

CharStream charStream = new ANTLRInputStream(inputStream, "UTF-8");

Andrew T Finnell
  • 13,417
  • 3
  • 33
  • 49
  • I added an answer here for ANTLR4. http://stackoverflow.com/questions/28126507/antlr4-using-non-ascii-characters-in-token-rules/28129510#28129510 – Terence Parr Jan 24 '15 at 19:46
-1

I solved this issue by putting the ImputStream into a BufferedStream and then removed the Byte Order Mark.

I guess my parser didn't like that encoding, because I also tried set the encoding explicitly.