We work with a pipeline of kafka/samza jobs using protobuf encoded messages. The pipeline can be quite lengthy for certain data sets and we want to add a timestamp/id for each stage in the pipeline to monitor efficiency and service health.
The additional information would be added to a repeated field in the schema called touchpoints. Obviously decoding the message in java/samza, adding the additional message and serializing again has an overhead which increases with the size of the message (some can be quite large increasing deserialize time), some parts of the pipe are just filters which check the message key and may not even have to deserialize at all so the less overhead on these the better.
Is it possible to just inject a second serialized message into an existing message without deserializing, if so would this be very bad practice to do so (I can only think it would) and is there a better solution to not having to deserialize/add/serialize for monitoring message path/time to flow