Of course you can do that without any Confluent tooling. But you have to do additional work on your side (e.g. in your application code) -- which was the original motivation of providing Avro-related tooling such as the ones from Confluent that you mentioned.
One option is to manually serialize/deserialize the payload of your Kafka messages (e.g. from YourJavaPojo
to byte[]
) by using the Apache Avro Java API directly. (I suppose that you implied Java as the programming language of choice.) How would this look like? Here's an example.
- First, you would manually serialize the data payload in your application that writes data to Kafka. Here, you could use the Avro serialization API for the encoding the payload (from Java pojo to
byte[]
), and then use Kafka's Java producer client for the writing encoded payload to a Kafka topic.
- Then, downstream in your data pipeline, you would deserialize in another application that reads data from Kafka. Here, you could use Kafka's Java consumer client for the reading the (encoded) data from the same Kafka topic, and use the Avro deserialization API for the decoding the payload back again (from
byte[]
to Java pojo).
You can also use the Avro API directly, of course, when working with stream processing tools like Kafka Streams (will be included in upcoming Apache Kafka 0.10) or Apache Storm.
Lastly, you also have the option to use some utility libraries (whether from Confluent or elsewhere) so that you don't have to use the Apache Avro API directly. For what it's worth, I have published some slightly more complex examples at kafka-storm-starter, e.g. as demonstrated by AvroDecoderBolt.scala. Here, the Avro serialization/deserialization is done by using the Scala library Twitter Bijection. Here's an example snippet of AvroDecoderBolt.scala
to give you the general idea:
// This tells Bijection how to automagically deserialize a Java type `T`,
// given a byte array `byte[]`.
implicit private val specificAvroBinaryInjection: Injection[T, Array[Byte]] =
SpecificAvroCodecs.toBinary[T]
// Let's put Bijection to use.
private def decodeAndEmit(bytes: Array[Byte], collector: BasicOutputCollector) {
require(bytes != null, "bytes must not be null")
val decodeTry = Injection.invert(bytes) // <-- deserialization, using Twitter Bijection, happens here
decodeTry match {
case Success(pojo) =>
log.debug("Binary data decoded into pojo: " + pojo)
collector.emit(new Values(pojo)) // <-- Here we are telling Storm to send the decoded payload to downstream consumers
()
case Failure(e) => log.error("Could not decode binary data: " + Throwables.getStackTraceAsString(e))
}
}
So yes, you can of course opt to not use any additional libraries such as Confluent's Avro serializers/deserializers (currently made available as part of confluentinc/schema-registry) or Twitter's Bijection. Whether that's worth the additional effort is up to you to decide.