I am trying create a grammar for a format that follows a type-length-value convention. Can ANTLR4 read in a length value and then parse that many characters?
Asked
Active
Viewed 406 times
1 Answers
0
NO ...
From your question (which is very short so I could miss something ...) I gather you are mixing grammar and encoding rules.
When you say type-length-value, it sounds like an encoding rule to me (how to serialize a data). In my experience, you write this code yourself.
A grammar is at a higher level: it's a piece of text that describes something. Antlr will help you breaking this text into tokens and then into a tree that you can navigate. This step only handles text: if you were going that way to solve your problem, you would still have to handle type, length and value yourself.
EDIT: with a bit of googling I found this https://github.com/NickstaDB/SerializationDumper

YaFred
- 9,698
- 3
- 28
- 40
-
Thanks for the insightful comment, I think this answers my question already. I'm hoping you can provide some follow up information.... Specifically I am trying to write a parser for java serialization. Oracle has provided a BNF like grammar here: https://docs.oracle.com/javase/8/docs/platform/serialization/spec/protocol.html. Some parts of this grammar specify how long a string is followed by a non-null terminated string. Is ANTLR not a realistic approach for parsing this format? Is this not a context free grammar? – poke Jul 05 '18 at 00:10
-
I understand. My own experience was: https://github.com/yafred/asn1-tool where the distinction between grammar and encoding rules is very clear. I'm not sure the word grammar in Oracle's document is quite proper. Let me have a look ... – YaFred Jul 05 '18 at 07:37
-
I stick to my first answer. This is not text and this can't be handled by a grammar parser. See my edited answer for a project similar to yours – YaFred Jul 05 '18 at 09:20
-
I don't fully agree with @YaFred: Essentially the tricky part is only the lexing part: Whereas for text lexing/parsing, one can typically tokenize the input stream by splitting the input text at whitespace separators, this is not easily possible for binary TLV data. But let's say one can write a lexer that splits the TLV input stream into `Tag`, `Length` and `Value` tokens, then a parser would be able to parse this token sequence. (But then again, I am not a parser expert) – Stefan D. Nov 18 '19 at 16:02