How does Kafka Schema registration happen in Spring Cloud Stream?

Question

I am trying to understand how to use Spring Cloud Streams with the Kafka Binder.

Currently, I am trying to register an AVRO schema with my Confluent Schema Registry and send messages to a topic.

I am unable to understand how the schema registration is being done by Spring Cloud Streams behind the scenes.

Lets take this example from the Spring Cloud Stream samples.

The AVRO schema is located in src/resources/avro

When the mvn:compile goal is run the POJO for the AVRO schema is generated and the producer can post data.

But what I am not able to understand is how Spring Cloud Stream is doing the schema registration to AVRO ?

    @Autowired
    StreamBridge streamBridge;

    @Bean
    public Supplier<Sensor> supplier() {
        return () -> {
            Sensor sensor = new Sensor();
            sensor.setId(UUID.randomUUID().toString() + "-v1");
            sensor.setAcceleration(random.nextFloat() * 10);
            sensor.setVelocity(random.nextFloat() * 100);
            sensor.setTemperature(random.nextFloat() * 50);
            return sensor;
        };
    }

    @Bean
    public Consumer<Sensor> receiveAndForward() {
        return s -> streamBridge.send("sensor-out-0", s);
    }

    @Bean
    Consumer<Sensor> receive() {
        return s -> System.out.println("Received Sensor: " + s);
    }

Is it done when the beans are created ?

Or is it done when the first message is sent ? If so then how does Spring Stream know where to find the .avsc file from ?

Basically what is happening under the hood ?

There seems to be no mention about this is in the docs.

Thanks.

score 1 · Accepted Answer · answered Sep 20 '21 at 20:15

Your serialization strategy (in this case, AVRO) is always handled in the serializers (for producers) and deserializers (for consumers).

You can have Avro (de)serialized keys and/or Avro (de)serialized values. Which means one should pass in KafkaAvroSerializer.class/KafkaAvroDeserializer.class to the producer/consumer configs, respectively. On top of this, one must pass in the schema.registry.url to the clients config as well.

So behind the scenes, spring cloud stream makes your application avro compatible when it creates your producers/consumers (using the configs found in application.properties or else where). Your clients will connect to the schema registry (logs will tell you if failed to connect) on start up, but does not do any schema registration out of the box.

Schema registration is done on the first message that gets sent. If you haven't already, you'll see that the generated POJOs contain the schemas already, so spring cloud stream doesn't need the .avsc files at all. For example, my last generated Avro pojo contained (line 4) :

@org.apache.avro.specific.AvroGenerated
public class AvroBalanceMessage extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
    private static final long serialVersionUID = -539731109258473824L;
    public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse({\"type\":\"record\",\"name\":\"AvroBalanceMessage\",\"namespace\":\"tech.nermindedovic\",\"fields\"[{\"name\":\"accountNumber\",\"type\":\"long\",\"default\":0},{\"name\":\"routingNumber\",\"type\":\"long\",\"default\":0},{\"name\":\"balance\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"default\":\"0.00\"},{\"name\":\"errors\",\"type\":\"boolean\",\"default\":false}]}");
    public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
    .......

When producers send this pojo, it communicates to the registry about the current version of the schema. If the schema is not in the registry, then the registry will store it and identify it by ID. The producer sends the message with its schema ID to the Kafka broker. On the other hand, the consumer will get this message and check if its seen the ID (stored in cache so you don't always have to retrieve the schema from the Registry) and if it hasnt, it will communicate with the registry to get such information about the message.

A bit outside of the scope of spring cloud stream, but one can also use the REST API for SR to manually register schemas.

How does Kafka Schema registration happen in Spring Cloud Stream?

1 Answers1