0

I was wondering can I use Confluent Schema registry to generate (and then send it to kafka) schema less avro records? If yes can somebody please share some resources for it? I am not able to find any example on Confluent website and Google.

I have a plain delimited file and I have a separate schema for it, currently I am using Avro Generic Record schema to serialize the Avro records and sending it through Kafka. This way the schema is still attached with the record which makes it more bulkier. My logic is that if I remove the schema while sending the record from kafka I will be able to get higher throughput.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Explorer
  • 1,491
  • 4
  • 26
  • 67
  • Why would you want to use a schema registry to send schemaless records? I am confused. – Fabien Jul 13 '17 at 02:46
  • Actually I am currently using Generic record Avro schema to generate Avro records from csv so my understanding is it is appending schema to the Avro binary records while sending it to kafka which makes my Kafka load more bulkier. – Explorer Jul 13 '17 at 02:49
  • 1
    I am not aware that you can natively dissociate Avro from the schema incorporated in the data... But, it seems that Kafka implements specific serializers for Avro and to strip of the Avro schema for transfer: https://github.com/confluentinc/schema-registry/blob/master/avro-converter/src/main/java/io/confluent/connect/avro/AvroData.java – Fabien Jul 13 '17 at 02:57

2 Answers2

1

The Confluent Schema Registry will send Avro messages serialized without the entire Avro Schema in the message. I think this is what you mean by "schema less" messages.

The Confluent Schema Registry will store the Avro schemas and only a short index id is included in the message on the wire.

The full docs including a quickstart guide for testing the Confluent Schema Registry is here

http://docs.confluent.io/current/schema-registry/docs/index.html

Hans Jespersen
  • 8,024
  • 1
  • 24
  • 31
  • Thanks for your answer, I have a plain delimited file so how can I append the registry schema id to it and send it through kafka? do you any example of it? – Explorer Jul 13 '17 at 03:05
  • There are example of how to publish using the Kafka Java Producer API in the docs. Specifically here http://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html – Hans Jespersen Jul 13 '17 at 03:34
  • Also you can publish Avro messages to Kafka using the REST API. Documentation and examples are here http://docs.confluent.io/current/kafka-rest/docs/intro.html – Hans Jespersen Jul 13 '17 at 03:36
  • 1
    Thanks Hans, I was checking the serializer code and I have a question there, why there is a need to use the userSchema even if the schema is already registered in Schema Registry? – Explorer Jul 13 '17 at 17:19
  • It's optional but the publisher might be registering a newer version of the Schema than the one that's in the registry. – Hans Jespersen Jul 13 '17 at 17:40
  • how can I use the existing schema to serialize the data? I tried to pull the schema in schema registry and use it like `val testschema = scala.io.Source.fromURL(url).mkString` but it failed while parsing it `org.apache.avro.SchemaParseException: No type` on line `Schema schema = parser.parse(userSchema);` – Explorer Jul 13 '17 at 20:23
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/149169/discussion-between-liveandletlive-and-hans-jespersen). – Explorer Jul 13 '17 at 20:27
0

You can register the your avro schema first time with the help of below command from cmd

curl -X POST -i -H "Content-Type: application/vnd.schemaregistry.v1+json" \
        --data '{"schema": "{\"type\": \"string\"}"}' \
        http://localhost:8081/subjects/topic

You can see all versions of your topic using

curl -X GET -i http://localhost:8081/subjects/topic/versions

To see complete Acro schema for version 1 from all versions present in confluent schema registry use below command, will show schema in json format

  curl -X GET -i http://localhost:8081/subjects/topica/versions/1

Avro schema registration is task of Kafka producer

After having schema in confluent schema registry, you just need to publish avro generic records to specific kafka topic, in our case it is 'topic'

Kafka Consumer :Use below code to take latest schema for specific Kafka topic

val schemaReg = new CachedSchemaRegistryClient(kafkaAvroSchemaRegistryUrl, 100)
val schemaMeta = schemaReg.getLatestSchemaMetadata(kafkaTopic + "-value")
val schema = schemaMeta.getSchema
val schema =new Schema.Parser().parse(schema)

Above will be use to get schema and then we can use confluent to decode record from kafka topic.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Sagar balai
  • 479
  • 6
  • 13