From the homepage:
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for [...] or binary files.
I have read through the docs now for some hours and think that I have some basic understanding of ANTLR, but I have a hard time to find any references to processing binary files. And I'm not the only one as it seems.
I need to create a parser for some binary data and would like to decide if ANTLR is of any help or not.
Binary data structure
That binary data is structured in logical fields like field1, which is followed by field2, which is followed by field3 etc. and all those fields have a special purpose. The length of all those fields may differ AND may not be known at the time the parser is generated, so e.g. I do know that field1 is e.g. 4 bytes always, field2 might simply be 1 byte and field3 might be 1 to 10 bytes and might be followed by additional field3s with n bytes, depending on the actual value of the data. That is the second problem, I know the fields are there and e.g. with field1 I know it's 4 bytes, but I don't know the actual value, but that is what I'm interested in. Same goes for the other fields, I need the values from all of those.
What I need in ANTLR
This sounds like a common structure and use case for some arbitrary binary data to me, but I don't see any special handling of such data in ANTLR. All examples are using some kind of texts and I don't see some value extraction callbacks or such. Additionally, I think I would need some callbacks influencing the parsing process itself, so for e.g. one callback is called on the first byte of field3, I check that, decide that one to N additional bytes need to be consumed and that those are logically part of field3 and tell the parser that, so it's able to proceed "somehow".
In the end, I would get some higher level "field" objects and ANTLR would provide the underlying parse logic with callbacks and listener infrastructure, walking abilities etc.
Did anyone ever do something like that and can provide some hints to examples or the concrete documentation I seem to have missed? Thanks!
EN 13757-3:2012
I don't think it makes understanding my question really easier, but the binary data I'm referring to is defined in the standard EN 13757-3:2012
:
Communication systems for and remote reading of meters - Part 3: Dedicated application layer
The standard is not freely available on the net (anymore?), but the following PDF might provide you an overview of how example data looks like in page 4. Especially that bytes of the mentioned fields are not constant, only the overall structure of the datagram is defined.
http://fastforward.ag/downloads/docu/FAST_EnergyCam-Protocol-wirelessMBUS.pdf
The tokens for the grammar would be the fields, implemented by a different amount of bytes, but with a value etc. Regarding the self-description of ANTLR, I would expected such things to work somehow...
Alternative: Kaitai.io
Whoever is in a comparable position like me currently, have a look at Kaitai.io, which reads very promising: