0

I'm trying to learn how to use Antlr4 in Unity. I saw code in class ActionLexer from other program

private static string _serializeATN()
{
StringBuilder stringBuilder = new StringBuilder();
        stringBuilder.Append("\u0003а훑舆괭䐗껱趀ꫝ\u0002\u000e");
        stringBuilder.Append("\u00a0\b\u0001\u0004\u0002\t\u0002\u0004\u0003\t\u0003\u0004\u0004\t\u0004\u0004\u0005\t\u0005\u0004\u0006");
        stringBuilder.Append("\t\u0006\u0004\a\t\a\u0004\b\t\b\u0004\t\t\t\u0004\n\t\n\u0004\v\t\v\u0004\f\t\f");
        stringBuilder.Append("\u0004\r\t\r\u0004\u000e\t\u000e\u0004\u000f\t\u000f\u0004\u0010\t\u0010\u0004\u0011\t\u0011\u0004");
        stringBuilder.Append("\u0012\t\u0012\u0003\u0002\u0003\u0002\u0003\u0003\u0003\u0003\u0003\u0004\u0003\u0004\u0003\u0005\u0003\u0005\u0003");
        stringBuilder.Append("\u0006\u0003\u0006\u0003\a\u0003\a\u0003\b\u0003\b\u0003\b\u0003\b\u0003\b\u0003\b\u0003\b\u0003\b");   
             ...
return stringBuilder.ToString();

}

and then I copyed that code in my unity and debug it. the result is strange string.

+       stringBuilder   "а훑舆괭䐗껱趀ꫝ \b\t\t\t\t\t\a\t\a\b\t\b\t\t\t\n\t\n\v\t\v\f\t\f" System.Text.StringBuilder

I want to know why that happen. what is role for this function?

Joe Sewell
  • 6,067
  • 1
  • 21
  • 34
xcom2000
  • 1
  • 1
  • 1
    I would assume it's encoding something non-text-like into text. Maybe related: https://stackoverflow.com/questions/41306505/antlr4-what-does-atn-stand-for – Joe Sewell Aug 20 '21 at 14:38
  • Also, to clarify, this is in code that ANTLR generates? Is that what you mean by "other program"? – Joe Sewell Aug 20 '21 at 14:44
  • These are escape symbols .. like e.g. `\t` is a TAB, `\n` a new line etc – derHugo Aug 20 '21 at 14:46
  • The name suggests that it is serialized binary data encoding some syntax tree. – Olivier Jacot-Descombes Aug 20 '21 at 15:02
  • 1
    yes, I found the code: ActionParser.MainContext t = new ActionParser(new CommonTokenStream(new ActionLexer(new AntlrInputStream(str)))) { BuildParseTree = true }.main(); it's same as ANTLR logic – xcom2000 Aug 20 '21 at 15:18

2 Answers2

2

The ATN is the internal network (Augmented Transition Network) used by the ATN interpreter to execute the parser + lexer state machines. This structure is generated by ANTLR out of the grammar it was given and is at the heart of the entire machinery of the ANTLR implementation.

The generated parser and lexers need their ATN to work properly. But since the generated files are text it was necessary to serialise the generated network into a text string, to be able to write it to the generated files. This string is then de-serialised on startup of the parsing application to regenerate the original ATN in memory. So in short: it's not text per se, but binary data stored as text.

The ATN belongs to the internals of the parser/lexer implementation and you can safely ignore it for most purposes.

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
-1

You are looking at non-printable unicode characters. Quite what they are doing here is a bit of mystery.

  • \u0002 is ASCII code 2 (STX)
  • \u0003 is ASCII code 3 (ETX)
  • \t is a tab character
  • \a is a Line Feed character

https://www.rapidtables.com/code/text/ascii-table.html

Neil
  • 11,059
  • 3
  • 31
  • 56
  • I guess it is some binary data / control characters – derHugo Aug 20 '21 at 15:12
  • that program is using filehelper to read and convert text. the text is look like this:{ "LogicalNode" : [ { "ClickNode" : { "MultiAction" : [ { "TalkAction" : "t2102201_311"} ]} } ], 0}. – xcom2000 Aug 20 '21 at 15:16