I'm reverse engineering a file format that stores each field as TLV blocks (type, length, value).
The fields do not have to be in order, or even present at all. Their presence is denoted with a sentinel, which is a 16-bit type identifier and a 32-bit end offset. There are hundreds of unique identifiers, but a decent chunk of those are just single primitive values. aside from denoting the type, they can also identify what field the data should be stored in.
It is also worth noting that there will never be a duplicate id on a parent structure. The only time is can occur is if there are multiple of the same object type in an array/list.
I have successfully written a Kaitai definition for one of them:
meta:
id: struct_02ea
endian: le
seq:
- id: unk_00
type: s4
- id: fields
type: field_block
repeat: eos
types:
sentinel:
seq:
- id: id
type: u2
- id: end_offset
type: u4
field_block:
seq:
- id: sentinel
type: sentinel
- id: value
type:
switch-on: sentinel.id
cases:
0xF0: u1
0xF1: u1
0xF2: u1
0xF3: u1
0xF4: u4
0xF5: u4
size: sentinel.end_offset - _root._io.pos
Handling things this way does work, and I could likely map out the entire format like this. However, when it comes time to compiling this definition into another format, things get nasty.
Since I am wrapping each field in a field_block
, the generated code stores these values in that type of object. This is incredibly inefficient when half of the generated field_block
objects store a single integer. It would also require the consuming code to iterate through a list of each field block in order to get the actual field's value.
Ideally, I would like to define this structure so that the sentinels are only parsed while Kaitai is reading the data, and each value would be mapped to a field on the parent structure.
Is this possible? This technology is really cool, and I'd love to use it in my project, but I feel like the overhead that this is generating is a lot more trouble than it's worth.
Here's an example of the definition when compiled into C#:
using System.Collections.Generic;
namespace Kaitai
{
public partial class Struct02ea : KaitaiStruct
{
public static Struct02ea FromFile(string fileName)
{
return new Struct02ea(new KaitaiStream(fileName));
}
public Struct02ea(KaitaiStream p__io, KaitaiStruct p__parent = null, Struct02ea p__root = null) : base(p__io)
{
m_parent = p__parent;
m_root = p__root ?? this;
_read();
}
private void _read()
{
_unk00 = m_io.ReadS4le();
_fields = new List<FieldBlock>();
{
var i = 0;
while (!m_io.IsEof) {
_fields.Add(new FieldBlock(m_io, this, m_root));
i++;
}
}
}
public partial class Sentinel : KaitaiStruct
{
public static Sentinel FromFile(string fileName)
{
return new Sentinel(new KaitaiStream(fileName));
}
public Sentinel(KaitaiStream p__io, Struct02ea.FieldBlock p__parent = null, Struct02ea p__root = null) : base(p__io)
{
m_parent = p__parent;
m_root = p__root;
_read();
}
private void _read()
{
_id = m_io.ReadU2le();
_endOffset = m_io.ReadU4le();
}
private ushort _id;
private uint _endOffset;
private Struct02ea m_root;
private Struct02ea.FieldBlock m_parent;
public ushort Id { get { return _id; } }
public uint EndOffset { get { return _endOffset; } }
public Struct02ea M_Root { get { return m_root; } }
public Struct02ea.FieldBlock M_Parent { get { return m_parent; } }
}
public partial class FieldBlock : KaitaiStruct
{
public static FieldBlock FromFile(string fileName)
{
return new FieldBlock(new KaitaiStream(fileName));
}
public FieldBlock(KaitaiStream p__io, Struct02ea p__parent = null, Struct02ea p__root = null) : base(p__io)
{
m_parent = p__parent;
m_root = p__root;
_read();
}
private void _read()
{
_sentinel = new Sentinel(m_io, this, m_root);
switch (Sentinel.Id) {
case 243: {
_value = m_io.ReadU1();
break;
}
case 244: {
_value = m_io.ReadU4le();
break;
}
case 245: {
_value = m_io.ReadU4le();
break;
}
case 241: {
_value = m_io.ReadU1();
break;
}
case 240: {
_value = m_io.ReadU1();
break;
}
case 242: {
_value = m_io.ReadU1();
break;
}
default: {
_value = m_io.ReadBytes((Sentinel.EndOffset - M_Root.M_Io.Pos));
break;
}
}
}
private Sentinel _sentinel;
private object _value;
private Struct02ea m_root;
private Struct02ea m_parent;
public Sentinel Sentinel { get { return _sentinel; } }
public object Value { get { return _value; } }
public Struct02ea M_Root { get { return m_root; } }
public Struct02ea M_Parent { get { return m_parent; } }
}
private int _unk00;
private List<FieldBlock> _fields;
private Struct02ea m_root;
private KaitaiStruct m_parent;
public int Unk00 { get { return _unk00; } }
public List<FieldBlock> Fields { get { return _fields; } }
public Struct02ea M_Root { get { return m_root; } }
public KaitaiStruct M_Parent { get { return m_parent; } }
}
}