What is the cause of unexpected bytes in java KafkaProtobufSerializer value?

Question

I have two applications, a node app, using @kafkajs/confluent-schema-registry library and a java app using standard KafkaProtobufSerializer.

Topic is schema bound with protobuf.

When I analyze the contents of the topic when the two apps serialize the same object in the KafkaUI application, they are not the same (and thus, KafkaUI cannot read the value from the node app with the SchemaRegistry value Serde).

Confluent tells us that:

1st byte is magic byte: 0 2-5 bytes are occupied by the registry ID (in this case 5). 6th onwards are the encoded object.

However: protobuf encoded object has following bytes: 08 4d 12 04 70 65 74 65

Java value using KafkaProtobufSerializer: 00 00 00 00 05 00 08 4d 12 04 70 65 74 65

Node value using confluent-schema-registry 00 00 00 00 05 08 4d 12 04 70 65 74 65

Where did this additional byte come from in the java app, and more pertinently, how is it derived? I expect both applications to produce identical byte arrays for an object with the same type (created from the same .proto file) and same values for its properties. The difference is a problem as it means I cannot consume this data with my java consumer.

Example of a different type (but with registry ID 4):

Java: 00 00 00 00 04 02 04 0a 26... (rest of object)

Node: 00 00 00 00 04 0a 26... (rest of object)

Where did these 2 additional bytes come from?! According to confluent, they shouldn't be there, but by all accounts, if we trust the java libraries they are the correct version!

Furthermore I have downloaded the protoscope application and passed the assumed payload into it, it will not work if I pass the java value from byte 6 onwards - it only works from byte 8, which demonstrates they are not coming from the protobuf serialization, and again begs the question: what are these bytes for, and why do they seem to matter to java?

score 0 · Answer 1 · answered Jul 27 '23 at 17:05

6th onwards are the encoded object

This is correct for Avro and JSONSchema, not Protobuf.

See - "Message indices"

an array of indexes that corresponds to the message type (which may be nested). A single Schema Registry Protobuf entry may contain multiple Protobuf messages, some of which may have nested messages. The role of message-indexes is to identify which Protobuf message in the Schema Registry entry to use

https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format

The Java one is correct, the NodeJS one is not

What is the cause of unexpected bytes in java KafkaProtobufSerializer value?

1 Answers1