SchemaRegistry helps with sharing the write Avro schema, which is used to encode a message, with the consumers that need the write schema to decode the received message. Another important feature is assisting the schema evolution.
Let's say a producer P defines a write Avro schema v1 that is stored under the logical schema S, a consumer C1 that defines a read (projection) schema v1 and another consumer C2 that defines its own read (projection) schema. The read schemas are not shared as they are used locally by Avro to translate messages from the writer schema into the reader schema.
Imagine the schema evolution without any breaking changes:
- The consumer C1 requests a new property by the new optional field added to its schema. This is a backward-compatible change. Messages encoded without this field will be still translated into the read schema. Now we've got v2 of the C1's read schema.
- The producer P satisfies the consumer C1's need by the new field added to its schema. The field doesn't have to be required as this is a forwards-compatible change. The consumer C1 will access the data encoded in the newly added field. The consumer C2 will simply ignore it, as it is a tolerant reader. Now we've got v2 of the P's write schema. Consumers need to know the exact schema with which the messages were written, so the new version is stored under the logical schema S.
Now imagine some schema breaking changes:
- The producer P decides to delete a non-optional field. One of the consumers might use this field. This is not a forwards-compatible change. Assuming the subject S is configured with FORWARD_TRANSITIVE compatibility type, the attempt to store the new write schema will fail. We are safe.
- The consumer C2 requests a new property by the new field added to its schema. Since it's not written by the producer, this is not a backward-compatible change.
The question is how can the SchemaRegistry come in handy to prevent any breaking changes on the consumer side? Note that the compatibility check of the read schema has to be done against all versions of the write schema.
There is an endpoint that allows checking the compatibility against the versions in the subject. The issue is that it uses the compatibility type that is set on the subject. The subject which contains versions of the write schema can not be used, because it is configured with FORWARD_TRANSITIVE compatibility type, but the read schema has to be backward compatible.
Creating another subject with the compatibility type BACKWARD_TRANSITIVE will not work, because a new version of the write schema with a forwards-compatible change (e.g. add a non-optional field) will fail to be stored in this subject.
One option that came to mind is to have some unit tests written using the CompatibilityChecker. It's an ugly solution because each consumer must hold locally all versions of the write schema. It's going to be a pain to sync all the consumers when the producer's schema changes.