1

I'm writing a Java program that reads from an Apache Kafka data stream. The messages are Avro serialized. Each message contains a single Avro-serialized record. The package io.confluent.kafka.serializers.KafkAvroDeserializer is not available on my development platform; if it were, then it would be a simple matter of setting the property "value.deserializer" to the proper setting (KafkaAvroDeserializer.class) for Avro deserialization. Therefore, I need to write my own deserializer. All I've found online has been about how to serialize records, write them to a file using a specified Avro schema, and then how to read and DEserialize the written records. The Java Parsers library, as far as I can tell, doesn't provide much help as it assumes that the program receives the data in UNserialized form, then gives the necessary object methods to serialize it, then the necessary methods to DEserialize it.

So, my question is this: how do I deserialize an Avro-formatted record in Java? Just a single record. I have the serialized record. I have the Avro schema. I don't need to append it a serializable file. I simply need to deserialize that record. I can't believe this hasn't been done before, but I've literally found nothing about it online. Can anybody help me on this?

Thank you in advance.

I've tried using Java Parsers library, various Kafka Avro "value.deserializer" property setting, differnet packages in my Eclipse pom.xml file. All to no avail.

There's GOT to be a simple way of deserializing a single Avro record by providing the necessary schema (which I've got).

  • The link sent does NOT answer my question; I'd seen that link a number of times perviously. It shows how to SERIALIZE objects to a file, then DESERIALIZE by reading from that file. I specifically stated that that's NOT what I need to do. I need to simply take an already-serialized that's NOT in a file, and deserialize it. – Stuart Odom Mar 20 '23 at 16:49
  • There is one file. An AVSC file. Using a Schema is a requirement to use Avro. You can also hard-code the schema in a `String` and use `Schema.Parser`.... The rest is coming from Avro `byte[]` data already in Kafka, not any "files" – OneCricketeer Mar 20 '23 at 19:11
  • Maybe you can edit your question to clarify what exactly you tried as a [mcve]? Otherwise, we can only speculate what your actual problem is. – OneCricketeer Mar 20 '23 at 19:14
  • Are your messages coming from an environment with the Confluent Schema Registry? If yes - take a look at this [answer](https://stackoverflow.com/a/31206063/355438). It reveals the structure of such messages that were serialized using Confluent's `KafkaAvroSerializer`. Also, you may take a look at the alternative implementation of this format: https://github.com/LinkedInAttic/camus/tree/master/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders – Ilya Serbis Apr 13 '23 at 21:25
  • If you control serialization by yourself, then you can utilize an approach without Confluent Schema Registry: https://stackoverflow.com/questions/47268010/avro-serializer-and-deserializer-with-kafka-java-api – Ilya Serbis Apr 13 '23 at 21:33

1 Answers1

0

It has been done before. The Avro documentation (and/or JavaDoc) covers exactly what you need, which is the BinaryDecoder class.

That same documentation assume records are serialized. However, Confluent's Avro serialized records are not the same as directly serialized records, therefore, it's unclear what exactly you need.

The package io.confluent.kafka.serializers.KafkAvroDeserializer is not available on my development platform

Then use Maven / Gradle to add it? But this will require you have a running Schema Registry, not just an Avro schema file.

All I've found online has been ...

Confluent's deserializer code is open-source, so you could just copy the class on your own. As mentioned, it's using the BinaryDecoder class.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • I can't add it. I'm in an environment in which it's not available for adding. That's why I'm looking for some help with Java code to deserialize it myself. And creating DatumWriter's and DatumReader's is not the answer. – Stuart Odom Mar 20 '23 at 16:51
  • Explain why is that not the answer? `SpecificDatumReader` (or `GenericDatumReader`) are what return you a specific (or generic) Java object to consume and process. – OneCricketeer Mar 20 '23 at 19:10
  • And why exactly cant you add external libraries? You are disconnected from the internet, or cannot otherwise copy JARs into that project? How did you add `kafka-clients`, then? You mentioned you have a `pom.xml`, so clearly you are using Maven and should be able to add it... https://docs.confluent.io/kafka-clients/java/current/overview.html – OneCricketeer Mar 20 '23 at 19:16
  • @OneCricketeer, as you correctly pointed out in your answer Avro messages that were serialized using Confluent Schema Registry are not simple Avro messages because they contain additional data (schema ID). You can't just feed such a message to the `BinaryDecoder` - you'll get an exception. – Ilya Serbis Apr 13 '23 at 15:26
  • @IlyaSerbis You can, but it requires using `ByteArrayDeserializer`, then `ByteBuffer` to skip over the first 5 bytes before using `BinaryDecoder` on the remainder of the data. This is exactly [what confluent already does](https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroDeserializer.java#L485). But doing that on your own, requires providing the schema somehow other than looking up from the Registry... – OneCricketeer Apr 13 '23 at 21:10