Can you inject serialized message into another protobuf message

Question

We work with a pipeline of kafka/samza jobs using protobuf encoded messages. The pipeline can be quite lengthy for certain data sets and we want to add a timestamp/id for each stage in the pipeline to monitor efficiency and service health.

The additional information would be added to a repeated field in the schema called touchpoints. Obviously decoding the message in java/samza, adding the additional message and serializing again has an overhead which increases with the size of the message (some can be quite large increasing deserialize time), some parts of the pipe are just filters which check the message key and may not even have to deserialize at all so the less overhead on these the better.

Is it possible to just inject a second serialized message into an existing message without deserializing, if so would this be very bad practice to do so (I can only think it would) and is there a better solution to not having to deserialize/add/serialize for monitoring message path/time to flow

score 4 · Accepted Answer · answered Apr 22 '16 at 11:14

In general, this would be quite tricky and can't be done in a "streaming" way for the following reason: Child messages are prefixed with their size encoded in a variable length integer. So injecting something would mean to adjust all the parent sizes recursively up to the root, and the size changes may move the contents again because of the variable length encoding of the size.

One thing you could do to avoid this problem might be to use fixed size fields for the time stamps, and to make sure they are filled with a value when building the protos at the first stage, so you have already allocated the corresponding space in the proto. This should allow you to to scan the proto for the (ideally unique) time stamp field ids using a CodedInputStream, and to write the patched stream back using a CodedOutputStream. Getting this correct will still require understanding the internal format. I'd recommend to start with an empty pass-trough "filter" first and to check that the output matches the input (update the question if you run into any issues with that)

Totally forgot about message size prefix that does complicate matter somewhat. Not sure fixed values would work as we have no idea of the number of stages a message may have or not. Will make a good future project for a lib though so I'll probably come back to it in the future. Cheers for the info! — Philip Pryde, Apr 22 '16 at 11:46

Can you inject serialized message into another protobuf message

1 Answers1