1

I'm using construct 2.8 to reverse engineer the header of some files created by some long-lost Pascal program.

The header is made of a number of different records, some of which are optional, and I'm not sure whether the order is fixed or not.

For instance, two of the records look like this:

header_record_filetype = cs.Struct(
    'record_type' / cs.Int8ub,
    'file_type' / cs.PascalString(cs.Int16ub),
    'unknown' / cs.Int8ub
)

header_record_user = cs.Struct(
    'record_type' / cs.Int8ub,
    'user' / cs.PascalString(cs.Int16ub)
)

And I've identified half a dozen more.

How would I go about getting the parser to choose the correct record type based on the record_type member for an unknown number of records until it comes across a record with type 0 (or reaches the end of the file)?

MerseyViking
  • 389
  • 3
  • 19

3 Answers3

3

I've solved it like this:

header = cs.Struct(
    'record_type' / cs.Int8ub,
    'record' / cs.Switch(cs.this.record_type, {header_record_type_0x01: header_record_0x01,
                                               header_record_type_filename: header_record_filename,
                                               header_record_type_filetype: header_record_filetype,
                                               header_record_type_user: header_record_user,
                                               header_record_type_end: header_record_end,
                                               header_record_type_image_metadata: header_record_image_metadata},
                         default=header_record_end
                         ),
    'offset' / cs.Tell
)

with open(sys.argv[1], 'rb') as f:
    h = f.read(2048)
    index = 0
    record_type = h[index]

    while record_type != 0:
        record = header.parse(h[index:])
        print(record)
        index += record.offset
        record_type = record.record_type

But I don't know if that is the best* way of doing it.

*For some value of "best".


Edit

I found the RepeatUntil() construct hiding at the bottom of a help page. So now I have this:

header = cs.Struct(
    'type' / cs.Enum(cs.Int8ub,
                     file_metadata=0x01,
                     filename=0x02,
                     file_type=0x03,
                     user=0x0A,
                     image_metadata=0x10,
                     end=0xFF),

    'record' / cs.Switch(cs.this.type, {'file_metadata': header_record_file_metadata,
                                        'filename': header_record_filename,
                                        'file_type': header_record_filetype,
                                        'user': header_record_user,
                                        'end': header_record_end,
                                        'image_metadata': header_record_image_metadata}),
    'size' / cs.Tell
)

with open(sys.argv[1], 'rb') as f:
    h = f.read(2048)
    records = cs.RepeatUntil(lambda obj, lst, ctx: obj.type == 'end', header).parse(h)
    print(records)

Which feels a lot cleaner and more in keeping with the declarative nature of construct.

MerseyViking
  • 389
  • 3
  • 19
1

For the record, I am Construct developer. If you would like this code to be uptodate with current versions then:

  • String classes need to have encoding, its mandatory
  • Embedded does not support IfThenElse and Switch classes
ArekBulski
  • 4,520
  • 4
  • 39
  • 61
0

You chose an interesting challenge. It appears construct does support various conditional definitions: http://construct.readthedocs.io/en/latest/misc.html#conditional

Additionally I find the examples, like this one informational: https://github.com/construct/construct/blob/master/construct/examples/formats/executable/elf32.py

I would probably also define a header and body type, e.g.:

header_body_record_filetype = cs.Struct(
    'file_type' / cs.PascalString(cs.Int16ub),
    'unknown' / cs.Int8ub
)

header_body_record_user = cs.Struct(
    'user' / cs.PascalString(cs.Int16ub)
)

header_record = cs.Struct(
    'record_type' / cs.Int8ub,
    'body' / Embedded(IfThenElse(this.record_type == "user",
        header_body_record_user,
        header_body_record_filetype,
    ))
)
de1
  • 2,986
  • 1
  • 15
  • 32
  • Yeah, I ended up doing something along those lines. One of the problems I was having was determining the length of each record type, because of the variable-length strings. But figured out how to use Tell to, well, tell me :) – MerseyViking Oct 29 '17 at 19:13