5

I have some binary data, which was obtained by serializing a google protocol buffer class. How do I find out, at runtime, the class for which the data was serialized.

For example, suppose i have a class abc. I serialized this class abc into binary data. Is there any way of validating that this binary data was obtained by serializing class abc, and not some other class?

Further, if i parse this binary data of class abc by the parse method of class xyz, how would I know if the parse was successful.

Tushar Koul
  • 2,830
  • 3
  • 31
  • 62
  • If you have control over both ends of the connection you can "cheat" by adding the name of the (outermost) class as a prefix to the Google protocol buffers data. That's what I'm doing, see here http://stackoverflow.com/a/17923846/253938 . – RenniePet Aug 07 '13 at 23:32
  • Hmm..that would work :D..but sadly I dont have control over the senders side – Tushar Koul Aug 08 '13 at 06:18

1 Answers1

3

protobuf does not include any type information on the wire (unless you do that yourself external to protobuf). As such you cannot strictly validate that - which is actually a good thing, because it means that types are interchangeable and compatible. As long as class abc has a compatible contract to the other type, it will work. By "compatible" here, I mean: for any field-numbers that are common to both, they have compatible wire-types. If abc declares field 4 to be a string, and the other class declares field 4 to be a double-precision number, then it will fail at deserialize.

One other "signal" you could use is the omission of required fields: if abc always includes field 3, but you get data that omits field 3, then it probably isn't an abc. Note that protobuf is designed to be version tolerant, though: you can't assume that extra fields mean it isn't an abc, as it could be that the data is using a later version of the contract, or is using extension fields. Likewise, missing optional fields could be missing because either they simply chose not to provide a value, or that field is not declared on the version of the contract they are using.

Re testing for a successful parse: that will be implementation specific. I would imagine that the c++ implementation will have either a return-value to check, or a flag field to check. I don't use that api myself so I cannot say. On some other platforms I would expect an exception to be thrown (java, .net, etc) if there was a critical issue.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • parse functions do return a bool value. But sadly they are returning _true_ in both cases - if I parse using either `class abc` or `class xyz` – Tushar Koul Mar 06 '13 at 12:22
  • @TusharKoul well, comparing to the rest of my answer: is there anything between `abc` and `xyz` that would make them *incompatible*? The main things here would be a field-number with very different meanings in the two cases. Overall, protobuf adopts the philosophy that you already know what your data is meant to be and *get it right*. – Marc Gravell Mar 06 '13 at 12:37
  • The question is not of compatibility. I just wanted a validation that the data will be parsed for the correct class. Like you mentioned in the comment, if the class was wrong and the parse returned false, i would know that I'm callig the parse for the wrong class. But i guess you _are_ right about the philosophy of protobuf that the class type should be already known. – Tushar Koul Mar 06 '13 at 12:46
  • @TusharKoul indeed, what I'm trying to demonstrate here is that you **cannot** know that it was the *same* type, but there are some limited occasions when you **can** determine that it was a *different* type - i.e. when there is an actual conflict. The absence of a conflict does not indicate sameness. – Marc Gravell Mar 06 '13 at 13:07