Exception in serializing AVRO data with schema

Question

I have registered the schema under subject called physics in the schmema registry.

 curl -X POST \
  http://localhost:8081/subjects/physics/versions \
  -H 'accept: application/vnd.schemaregistry.v1+json' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/vnd.schemaregistry.v1+json' \
  -H 'postman-token: 5ce4bab3-6d0c-2940-d7dd-8f901b0b6fec' \
  -d '{"schema":"{\"type\":\"record\",\"name\":\"physics\",\"fields\":[{\"name\":\"ATTRIBUTE1\",\"type\":\"string\",\"default\":\"\"},{\"name\":\"ATTRIBUTE2\",\"type\":\"long\",\"default\":0},{\"name\":\"ATTRIBUTE3\",\"type\":\"string\",\"default\":\"\"},{\"name\":\"ATTRIBUTE4\",\"type\":\"string\",\"default\":\"\"},{\"name\":\"ATTRIBUTE5\",\"type\":\"string\",\"default\":\"\"}]}"}'

My Kafka Producer code is similar to this.

SchemaRegistryClient schemaRegistryClient = new CachedSchemaRegistryClient("http://localhost:8081", 20);
        Schema schema = schemaRegistryClient.getBySubjectAndId("physics", ID);

        Producer<String, GenericRecord> producer = new KafkaProducer<String, GenericRecord>(kafkaProps);
        GenericRecord inputRecord = new GenericData.Record(schema);
        inputRecord.put("ATTRIBUTE1", "val1");
        inputRecord.put("ATTRIBUTE2", System.currentTimeMillis());
        inputRecord.put("ATTRIBUTE3", "val3");

        ProducerRecord<String, GenericRecord> recordData = new ProducerRecord<String, GenericRecord>(topic,
                subjectName, inputRecord);
        producer.send(recordData).get();

Compatibility: Backward Serializer: KafkaAvroSerializer

While running Kafka producer I am getting below exception.

Exception in thread "main" org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
Caused by: java.lang.NullPointerException: null of string in field ATTRIBUTE4 of physics
    at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:145)
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:139)
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
    at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:92)
    at kafka.MyKafkaAvroSerializer.serialize(MyKafkaAvroSerializer.java:27)
    at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:453)
    at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:430)
    at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:353)
    at kafka.MyAvroProducerV3.main(MyAvroProducerV3.java:62)
Caused by: java.lang.NullPointerException
    at org.apache.avro.io.Encoder.writeString(Encoder.java:121)
    at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:267)
    at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:262)
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:128)

If I am sending extra attribute viz. ATTRIBUTE6 from Kafka Producer then it is throwing an exception saying ATTRIBUTE6 is unregistered. Which is perfectly okay.

As I have registered 5 attributes (ATTRIBUTE1,ATTRIBUTE2,ATTRIBUTE3,ATTRIBUTE4,ATTRIBUTE5), if I am sending only 3 attributes (viz. ATTRIBUTE1,ATTRIBUTE2,ATTRIBUTE3) consumer should receive only these 3 attributes and default value(e.g. -1) for missing ones. But Why I am getting above exception?

Am I missing something?

You can define the schema in the code and the producer will auto register... There's no reason to POST ahead of time, then do a lookup — OneCricketeer, Sep 16 '18 at 21:44
In this case I will have different(may be millions) version of same schema. Lets assume I started working with 5 attribute and down the timeline numbers of attributes grows to 500. My input source may send only 2 attributes one time and 5 or 6 or 100 or 34 or 54 attributes other time. Creating schema in the Producer code each time and auto registering them will create many version. Taking permutation combination of 500 attributes it may reach million versions of same schema. — tryingSpark, Sep 17 '18 at 05:21
One possible way to avoid million schema versions is while creating manual schema in the Producer code I will ensure the fix order of attributes in the generated schema. Lets say input received attribute1, attribute3, attribute2 OR attribute3, attribute1, attribute2 OR attribute1, attribute2, attribute3 .... I will always generate schema in fix order i.e. attribute1, attribute2, attribute3 so that I will at most 500 schema in this case. — tryingSpark, Sep 17 '18 at 05:28
If you are just having random numbered attributes, I would suggest you use the map or array types of Avro. Regarding the question `null of string in field ATTRIBUTE4` doesn't make sense unless `ID` is not the value of the shown schema you posted. — OneCricketeer, Sep 17 '18 at 07:21
Yes that should work. I will try this. Thanks for the pointer. — tryingSpark, Sep 17 '18 at 15:13

Exception in serializing AVRO data with schema

0 Answers0