3

when trying to stream Avro data with Kafka Streams, I came across this error:

Exception in thread "StreamThread-1" org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1 Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!

Even though I found several older threads about it in the mailing list, none of the solutions stated there fixed the problem. So hopefully, I can find a solution here.

My setup looks as follows:

StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String.getClass.getName
StreamsConfig.VALUE_SERDE_CLASS_CONFIG, classOf[GenericAvroSerde]   
AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, localhost:8081)  

I already tried setting the KEY_SERDE to the same as the VALUE_SERDE, but even though this was "marked" as a solution in the Mailing list, it did not work in my case.

I'm generating GenericData.Record with my Schema as follows:

val record = new GenericData.Record(schema)
...
record.put(field, value)

When I start the debug mode and check the generated record, everything looks fine, there is data in the record and the mapping is correct.

I stream the KStream like this (I used branch before):

splitTopics.get(0).to(s"${destTopic}_Testing")

I'm using GenericData.Record for the records. Might this be a problem in combination with the GenericAvroSerde?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Tim.G.
  • 299
  • 4
  • 16
  • What is over overall setup? Single topic? Can you read the data with the console consumer? Maybe this example helps: https://github.com/confluentinc/kafka-streams-examples/blob/4.0.0-post/src/main/java/io/confluent/examples/streams/WikipediaFeedAvroExample.java – Matthias J. Sax Dec 26 '17 at 18:47
  • Input topic has raw text (is parsed in Kafka Streams) and then mapped to `GenericData.Record` records. Output topic will have Avro data. I can read the messages from the input topic with the console consumer. When I start it in debug mode, I can also see the records before they get sent to with the `streams.to` call being Avro records. It's a single input topic but several output topics (at the moment 4). The output topics do not exist yet – Tim.G. Dec 26 '17 at 18:52
  • 1
    The magic byte is a byte that is added by confluent client as a kind of marker in front of a serialized message using Avro. That error may means that you are trying to deserialize some message with Confluent client but the message was not serialized using Avro by a Confluent client, are you mixing confluent and vanilla Kafka clients? – Luciano Afranllie Dec 26 '17 at 20:05
  • @Luciano Afranllie How can I check that in the best way? – Tim.G. Dec 26 '17 at 20:13
  • Actually is not Confluent client what I meant, I wanted to say Confluent KafkaAvroSerializer. Chek if your producers are configured to use that value serializer, like in [this example](https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html) – Luciano Afranllie Dec 26 '17 at 22:50
  • Ahh, yes I am. But other than the examples, I'm using the options `KEY_SERDE_CLASS_CONFIG` and `VALUE_SERDE_CLASS_CONFIG` in combination with the `GenericAvroSerde` from here: https://github.com/confluentinc/schema-registry/blob/master/avro-serde/src/main/java/io/confluent/kafka/streams/serdes/avro/GenericAvroSerde.java – Tim.G. Dec 27 '17 at 15:22
  • 1
    I think I found the problem... Thank you for asking the thing about the Serializer... I realized that I'm trying to deserialize Text with Avro when using the Serde... Not just serializing. – Tim.G. Dec 27 '17 at 15:32

1 Answers1

1

The solution to my problem was to exchange the VALUE_SERDE after deserializing the String value I get from my Input topic.

Since Serde is a combined "element" of Serializing and Deserializing, I cannot simply use StreamsConfig.VALUE_SERDE_CLASS_CONFIG, classOf[GenericAvroSerde] but have to use a StringSerde for deserializing the input records and only then use an AvroSerde to write it out to the output topic.
Looks like this now:

// default streams configuration serdes are different from the actual output configurations
val streamsConfiguration: Properties = {
  val p = new Properties()
  p.put(StreamsConfig.APPLICATION_ID_CONFIG, kStreamsConf.getString("APPLICATION_ID"))
  p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kStreamsConf.getString("BOOTSTRAP_SERVERS_CONFIG"))
  p.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, kStreamsConf.getString("AUTO_OFFSET_RESET_CONFIG"))
  p.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String.getClass.getName)
  p.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String.getClass.getName)
  p.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, kStreamsConf.getString("SCHEMA_REGISTRY_URL_CONFIG"))
  p
}

// adjusted output serdes for avro records
val keySerde: Serde[String] = Serdes.String
val valSerde: Serde[GenericData.Record] = new GenericAvroSerde()
valSerde.configure(
  Collections.singletonMap(
    AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
    streamsConfiguration.get(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG)
  ),
  /* isKeySerde = */ false
)

// Now using the adjusted serdes to write to output like this
stream.to(keySerde, valSerde, "destTopic")

This way, it works like charm.
Thank you

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Tim.G.
  • 299
  • 4
  • 16