3

I just started working with Kafka and I use Protocol Buffers for the message format and I just learn about schema registry.

To give some context we are a small team with a dozen of webservices and we use Kafka to communicate between them and we store all the schemas and read/write models in a library that is later imported by each service. This way they know to serialize/deserialize a message.

But now schema registry comes into play. Why use it? Now my infrastructure becomes more complicated plus I need to update it every time I change a schema and I need to define as well the read/write models in each service like I do now using the library.

So from my point of view I only see cons mainly just complicating things so why should I use a schema registry?

Thanks

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
kylie.zoltan
  • 377
  • 3
  • 15
  • What feedback do you have about this? https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you-really-need-one/ – OneCricketeer Jul 22 '22 at 15:44
  • 1
    I will tackle the points in that article. 1 - It's a small team and the schemas are shared via a dependency. 2 - I agree with this state. 3 - If my schemas are backwards compatible this isn't an issue. 4 - Makes no difference for protobufs. 5 - Doesn't apply to my use case. 6 - Again shared dependency. 7 - We already do this on code reviews. – kylie.zoltan Jul 22 '22 at 15:52
  • Another thing to consider - Do you plan on using Kafka Connect at any point? You'd need to write your own Converter class, or [maybe use this one](https://github.com/blueapron/kafka-connect-protobuf-converter). If you want to use KSQL, though, then Protobuf needs to use Schema Registry... Otherwise, it you're fine with plain Kafka features, and re-building everything else, then that's fine. – OneCricketeer Jul 22 '22 at 15:59

1 Answers1

0

The schema registry ensures your messages will not deviate from a common base compatibility guarantee (the first version of the schema).

For example, you have a schema that describes an event like {"first_name": "Jane", "last_name": "Doe"}, but then later decide that names can actually have more than 2 parts, so you then move to a schema that can support {"name": "Jane P. Doe"}... You still need a way to deserialize old data with first_name and last_name fields to migrate to the new schema having only name. Therefore, consumers will need both schemas. The registry will hold that and encode the schema ID within each payload from the producer. After all, the initial events with the two name fields would know nothing about the "future" schema with only name.

You say your models are shared in libraries across services. You probably then have some regression testing and release cycle to publish these between services? The registry will allow you to centralize that logic.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • About your first point I just need the latest schema since protobufs are forward and backward compatible. About your second point I just publish them together so I just need to update the dependency version and everything works. – kylie.zoltan Jul 22 '22 at 15:39
  • The registry will just add more complexity to the code base plus is another thing to maintain. I still don't see the need for it. – kylie.zoltan Jul 22 '22 at 15:41
  • Schema is not guaranteed to be forward/backward. In the given example, "new schema" has no way to know what to do with "first/last name"; it only knows "name". Another example, what is stopping any client from sending random garbage into your topics without any schema, or a completely different schema? As for adding - what complexity? You just change your de/serializer in the client. And that's it. Everything else is automatic, mostly. – OneCricketeer Jul 22 '22 at 15:42