0

I have a Kafka Avro Topic generated using KafkaAvroSerializer.
My standalone properties are as below.
I am using Confluent 4.0.0 to run Kafka connect.

key.converter=io.confluent.connect.avro.AvroConverter
value.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=<schema_registry_hostname>:8081
value.converter.schema.registry.url=<schema_registry_hostname>:8081
key.converter.schemas.enable=true
value.converter.schemas.enable=true
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false

When I run Kafka connectors for hdfs sink in standalone mode, I get this error message:

[2018-06-27 17:47:41,746] ERROR WorkerSinkTask{id=camus-email-service-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.DataException: Invalid JSON for record default value: null
    at io.confluent.connect.avro.AvroData.defaultValueFromAvro(AvroData.java:1640)
    at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1527)
    at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1410)
    at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1290)
    at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:1014)
    at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:88)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:454)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:287)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:198)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:166)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2018-06-27 17:47:41,748] ERROR WorkerSinkTask{id=camus-email-service-0} Task is being killed and will not recover until manually restarted (    org.apache.kafka.connect.runtime.WorkerTask)
[2018-06-27 17:52:19,554] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect).

When I use kafka-avro-console-consumer passing the schema registry, I get the Kafka messages deserialized.

i.e.:

/usr/bin/kafka-avro-console-consumer --bootstrap-server <kafka-host>:9092 --topic <KafkaTopicName> --property schema.registry.url=<schema_registry_hostname>:8081
Phantômaxx
  • 37,901
  • 21
  • 84
  • 115
Rupesh More
  • 35
  • 2
  • 10
  • I think kafka-avro-console-consumer uses KafkaAvroDeserializer underneath. Not sure if the AvroConverter uses the same. – Rupesh More Jun 27 '18 at 18:36
  • It does https://github.com/confluentinc/schema-registry/blob/master/avro-converter/src/main/java/io/confluent/connect/avro/AvroConverter.java – OneCricketeer Jun 28 '18 at 03:58

2 Answers2

2

Changing the "subscription" column's datatype to Union datatype fixed the issue. Avroconverters were able to deserialize the messages.

Rupesh More
  • 35
  • 2
  • 10
1

I think your Kafka key is null, which is not Avro.

Or it is some other type but malformed, and not converted to a RECORD datatype. See AvroData source code

case RECORD: {
    if (!jsonValue.isObject()) {
      throw new DataException("Invalid JSON for record default value: " + jsonValue.toString());
    }

UPDATE According to your comment, then you can see this is true

$ curl -X GET localhost:8081/subjects/<kafka-topic>-key/versions/latest    
{"subject":"<kafka-topic>-key","version":2,"id":625,"schema":"\"bytes\""}

In any case, HDFS Connect does not natively store the key, so try not deserializing the key at all rather than using Avro.

key.converter=org.apache.kafka.connect.converters.ByteArrayConverter

Also, your console consumer is not printing the key, so your test isn't adequate. You need to add --property print.key=true

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Thanks @cricket_007. When i run kafka-avro-console-consumer with --print.key=true, i see the key. /usr/bin/kafka-avro-console-consumer --bootstrap-server :9092 --topic --property schema.registry.url=:8081 --from-beginning --max-messages 1 --property print.key=true "¶2âwÌS@¼T%\u0005ãé" {"id":{"string":"b632e277-cc53-4093-9cbc-86542505e3e9 ........} – Rupesh More Jun 28 '18 at 15:35
  • So i see a Key and we are using KafkaAvroSerializer at Producer for both key and value. In schema registry though i see the datatype as "bytes". The Producer generates this key in schema registry $ curl -X GET http://localhost:8081/subjects/-key/versions/latest {"subject":"-key","version":2,"id":625,"schema":"\"bytes\""} Do you think it is an issue with Key ? – Rupesh More Jun 28 '18 at 15:50
  • Also how do you ignore the key in standalone properties. When i donot specify the key, i get below error message: Exception in thread "main" org.apache.kafka.common.config.ConfigException: Missing required configuration "key.converter" which has no default value. – Rupesh More Jun 28 '18 at 15:50
  • @RupeshMore Updated the answer with the `key.converter` you need for `bytes` data – OneCricketeer Jun 28 '18 at 16:19
  • I tried the key.converter=org.apache.kafka.connect.converters.ByteArrayConverter, but still receiving : org.apache.kafka.connect.errors.DataException: Invalid JSON for record default value: null Any other pointers @cricket_007 ? Thank you! – Rupesh More Jun 28 '18 at 16:46
  • What is the value schema? Are any of your values null? – OneCricketeer Jun 28 '18 at 17:01
  • I seriously doubt the below Avro schema column type if null needs to be included in its type. {\"name\":\"subscription\",\"type\":{\"type\":\"record\",\"name\":\"Subscription\",\"doc\":\"Template subscription information\",\"fields\":[{\"name\":\"subscriptionId\",\"type\":[\"null\",\"int\"],\"default\":null},{\"name\":\"channelId\",\"type\":[\"null\",\"int\"],\"default\":null}]},\"default\":null} But then kafka-avro-console-consumer is able to decode it as below: "subscription":{"subscriptionId":null,"channelId":null} – Rupesh More Jun 28 '18 at 17:20
  • `Subscription` field is a record type, so it is getting into that case statement in the code. When `!jsonValue.isObject()` is true it might be reading `"default":null` for that entire record. – OneCricketeer Jun 28 '18 at 18:27
  • 2
    We changed the type to UNION datatype and the AvroConverters were able to deserialize that column. {\"name\":\"subscription\",\"type\":[\"null\",{\"type\":\"record\",\"name\":\"Subscription\",\"doc\":\"Template subscription information\",\"fields\":[{\"name\":\"subscriptionId\",\"type\":[\"null\",\"int\"],\"default\":null},{\"name\":\"channelId\",\"type\":[\"null\",\"int\"],\"default\":null}]}],\"default\":null} – Rupesh More Jul 03 '18 at 14:49
  • For integer fields, I typically use default of `-1` rather than null (assuming your value cannot otherwise be negative), but if that fixed your problem, then you can post a separate answer rather than leave as a comment – OneCricketeer Jul 03 '18 at 16:11
  • Sorry i am new to stackoverflow, learning tips and tricks. Thanks @cricket_007 for your instant replies. – Rupesh More Jul 09 '18 at 02:00