1

i have a serialized bin file of protobufs, written mainly in protobufs-net. i want to decompile it, and see the structure of it.

i used some toolds like : https://protogen.marcgravell.com/decode

and i also used protoc:

protoc --decode_raw < ~/Downloads/file.bin

and this is part of the result i get:

1 {
  1: "4f81b7bb-d8bd-e911-9c1f-06ec640006bb"
  2: 0x404105b1663ef93a
  3: 0x4049c6158c593f36
  4: 0x40400000
  5 {
    1: "53f8afde-04c6-e811-910e-4622e9d1766e"
    2 {
      1: "e993fba0-8fc9-e811-9c15-06ec640006bb"
    }
    2 {
      1: "9a7c7210-3aca-e811-9c15-06ec640006bb"
      2: 1
    }
    2 {
      1: "2d7d12f1-2bc9-e811-9c15-06ec640006bb"
    }
    3: 18446744073709551615
  }
  6: 46
  7: 1571059279000
}

how i can decompile it? i want to know the structure and change data in it and make a new bin file.

alone
  • 169
  • 7
  • 1
    well, which fields do you need to change? the simplest thing would be to create your own .proto that matches the flavor of the above, and deserialize/mutate/serialize – Marc Gravell Oct 24 '19 at 17:09
  • well i don't have any info about the program , i only have bin file, no other info – alone Oct 24 '19 at 17:12
  • depending on the wire-types reported (varint vs fixed32 vs fixed64 etc) this looks like just a bunch of int32/string/etc with a repeated sub-message... do you have the actual bin? – Marc Gravell Oct 24 '19 at 17:17
  • 1
    Note: if you can't post the file here, but you can share it more privately, please feel free to email it to me and I can readily reverse-engineer a passable schema from it. It won't have the original names, and we won't know whether some fields are zig-zag etc, but: it'll round-trip fine, so should work for the purpose here – Marc Gravell Oct 24 '19 at 17:24
  • yes , it's exactly as you are saying (as i decompile it using that online tool). yes i have bin file , how can i send it to you ? – alone Oct 24 '19 at 17:24
  • 1
    If it is small and not privileged: just post the hex or base-64 in the question. Otherwise again: email might be your best option (see my profile page). – Marc Gravell Oct 24 '19 at 17:26
  • i'll send it to you're gmail. thanks alot – alone Oct 24 '19 at 17:28
  • i just sent it, thanks again – alone Oct 24 '19 at 17:36
  • received, reverse engineered, and returned; I've done the same again here using **just** the data in the question (so it might not be as complete), for the benefit of others – Marc Gravell Oct 24 '19 at 18:19
  • Note that if you had the .exe or .dll that was generating these files, it is often possible to obtain the field names and other information from there. – jpa Oct 25 '19 at 05:03
  • thanks, i don't have access to code i only have serialized bin file. – alone Oct 25 '19 at 05:09

1 Answers1

3

Reverse engineering a .proto file is mostly a case of looking at the output of the tools such as you've mentioned, and trying to write a .proto that looks similar. Unfortunately, a number of concepts are ambiguous if you don't know the schema, as multiple different data types and shapes share the same encoding details, but... we can make guesses.

Looking at your output:

1 {
...
}

tells us that our root message probably has a sub-message at field 1; so:

message Root {
    repeated Foo Foos = 1;
}

(I'm guessing at the repeated here; if the 1 only appears once, it could be single)

with everything at the next level being our Foo.

  1: "4f81b7bb-d8bd-e911-9c1f-06ec640006bb"
  2: 0x404105b1663ef93a
  3: 0x4049c6158c593f36
  4: 0x40400000
  5: { ... }
  6: 46,
  7: 1571059279000

this looks like it could be

message Foo {
  string A = 1;
  sfixed64 B = 2;
  sfixed64 C = 3;
  sfixed32 D = 4;
  repeated Bar E = 5; // again, might not be "repeated" - see how many times it occurs
  int64 F = 6;
  int64 G = 7;
}

however; those sfixed64 could be double, or fixed64; and those sfixed32 could be fixed32 or float; likewise, the int64 could be sint64 or uint64 - or int32, sint32, uint32 or bool, and I wouldn't be able to tell (they are all just "varint"). Each option gives a different meaning to the value!

our Bar definitely has some kind of repeated, because of all the 2:

    1: "53f8afde-04c6-e811-910e-4622e9d1766e"
    2 { ... }
    2 { ... }
    2 { ... }
    3: 18446744073709551615

let's guess at:

message Bar {
  string A = 1;
  repeated Blap B = 2;
  int64 C = 3;
}

and finally, looking at the 2 from the previous bit, we have:

      1: "e993fba0-8fc9-e811-9c15-06ec640006bb"

and

      1: "9a7c7210-3aca-e811-9c15-06ec640006bb"
      2: 1

and

      1: "2d7d12f1-2bc9-e811-9c15-06ec640006bb"

so combining those, we might guess:

message Blap {
    string A = 1;
    int64 B = 2;
}

Depending on whether you have more data, there may be additional fields, or you may be able to infer more context. For example, if an int64 value such as Blap.B is always 1 or omitted, it might actually be a bool. If one of the repeated elements always has at most one value, it might not be repeated.

The trick is to to play with it until you can deserialize the data, re-serialize it, and get the exact same payload (i.e. round-trip).

Once you have that: you'll want to deserialize it, mutate the thing you wanted to change, and serialize.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900