2

I am trying to setup a Debezium MySQL Source connector. My goal is to have one topic for each database, so I am investigating the possibility to leverage subjects, in such a way that a topic can contain different message types and their schema can be stored in the Confluent Schema Registry.

Following several answers here, I have set the key and value converter subject name strategy to io.confluent.kafka.serializers.subject.TopicRecordNameStrategy.

To reroute all the messages coming from the same schema to the same topic, I am using the following configuration:

{
  "name": "aws-db-connector",
  "config": {
    "group.id": "aws-db-group",
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "mysql",
    "database.port": "3306",
    "database.user": "root",
    "database.password": "secret-pw",
    "database.server.id": "184054",
    "database.server.name": "aws-db",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes.aws-db",
    "database.include.list": "db1,db2",
    "transforms": "unwrap,Reroute",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
    "transforms.unwrap.delete.handling.mode": "rewrite",
    "transforms.unwrap.add.fields": "db,table,op,source.ts_ms",
    "transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
    "transforms.Reroute.topic.regex": "(.*\\S)\\.(.*\\S)\\.(.*\\S)",
    "transforms.Reroute.topic.replacement": "$2_schema",
    "transforms.Reroute.key.field.name": "table",
    "transforms.Reroute.key.field.regex": "(.*\\S)\\.(.*\\S)\\.(.*\\S)",
    "transforms.Reroute.key.field.replacement": "$3"
  }
}

In my docker-compose file I have set:

- CONNECT_KEY_CONVERTER_KEY_SUBJECT_NAME_STRATEGY=io.confluent.kafka.serializers.subject.TopicRecordNameStrategy
- CONNECT_VALUE_CONVERTER_VALUE_SUBJECT_NAME_STRATEGY=io.confluent.kafka.serializers.subject.TopicRecordNameStrategy
- CONNECT_KEY_CONVERTER=io.confluent.connect.avro.AvroConverter
- CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL=http://registry:8081
- CONNECT_VALUE_CONVERTER=io.confluent.connect.avro.AvroConverter
- CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL=http://registry:8081

For Values, this is working flawlessly. I can see that my schema registry contains multiple subjects with the format <TopicName>-<RecordName>-Value, where TopicName is the name of the topic I'm rerouting this data to. RecordName is the "old" topic name created by Debezium in the format server_name.database_name.table_name.

For Keys unfortunately, this strategy is not working as expected, and I only have one schema subject: it looks like RecordName contains the new topic name instead of the original one. This leads to collisions and incompatibility errors if one field name has different types in different tables.

Is there any way to provide the proper RecordName when the key subject is generated?

EDIT - adding example:

Let's suppose my database contains three tables, table1, table2 and table3.

Table1:

CREATE TABLE `table1` (
    `id` INT NOT NULL AUTO_INCREMENT,
    `name` TEXT,
    PRIMARY KEY (`id`)
);

Table2:

CREATE TABLE `table2` (
    `id` INT NOT NULL AUTO_INCREMENT,
    `name` BINARY,
    PRIMARY KEY (`id`)
);

Table3:

CREATE TABLE `table3` (
    `id` BINARY NOT NULL,
    `name` INT,
    PRIMARY KEY (`id`)
);

Running Debezium with the above configurations, it creates the following Value subjects inside the schema registry:

  • db1_schema.db1-aws-db.table1-Value
  • db1_schema.db1-aws-db.table2-Value

And the following Key subject:

  • db1_schema.db1_schema-Key

When it's the turn of table3, the Debezium connector fails, because the id column is registered in the schema registry subject with type int and it is incompatible with the bytes type in table3. Therefore I get this error:

Schema being registered is incompatible with an earlier schema; error code: 409

What I would expect, would be the creation of separate subjects for the key as well:

  • db1_schema.aws-db-db1.table1-Key
  • db1_schema.aws-db-db1.table2-Key
  • db1_schema.aws-db-db1.table3-Key

In such a way that messages with different key schemas could be stored in the same topic.

Vektor88
  • 4,841
  • 11
  • 59
  • 111
  • I assume you are using Avro? What does the key schema actually look like? – OneCricketeer Jun 03 '22 at 16:54
  • @OneCricketeer the key schema has a `id` column that is int in some tables and binary in some others. Since schema registry doesn't write to different key subjects, this throws an incompatibility error – Vektor88 Jun 04 '22 at 08:43
  • Can you edit the question to include your converter classes and the schemas? I was more interested in its schema record name. If it's just a primitive integer, then there will be no record name, so it's unclear what you expected – OneCricketeer Jun 04 '22 at 13:34
  • @OneCricketeer I would expect one key subject in the schema registry for each corresponding table/event-type, in the same way it happens for values. This is not happening and leads to all the event typed trying to write their schema in the same subject. – Vektor88 Jun 04 '22 at 17:01
  • I understood that from the original post, yes. But the keys aren't directly related to the table names. As I said if, it's only an integer or string ID, then they'll only ever be _that type_ of schema. There is no "record name" for Avro `{"type":"int"}` for example, so all tables with integer keys will use the same subject. But that really shouldn't matter for consumers, so please clarify what you mean by "collisions and incompatibility". Can you show a [mcve]? – OneCricketeer Jun 05 '22 at 13:33
  • @OneCricketeer I have tried to provide an example. By collision I mean that all the key schemas are written to the same key subject, there's no differentiation as it happens in the value schemas. This leads to an incompatibility because a key column already declared of a certain type, is suddenly updated to a different type, throwing a backwards compatibility error. This shouldn't happen because this key schema belongs to a totally different message type and should be inserted in a dedicated subject. – Vektor88 Jun 06 '22 at 07:27
  • Seems to me like your key converter should just use the default topic name strategy. Or add another transform, but separate out the different table types between different connector configs https://docs.confluent.io/platform/current/connect/transforms/setschemametadata.html – OneCricketeer Jun 06 '22 at 13:52
  • @OneCricketeer unfortunately if I do that, the name of the topic will become `db1_schema-Key`. For some reason the "original topic" name is used in the Value subject, but not in the key one. I investigated the `SetSchemaMetadata` transform, but I don't know if I can dynamically access to the original topic name with some kind of alias or not. – Vektor88 Jun 06 '22 at 13:55
  • I don't think that's the topic name you're looking at. Look at the schema that's registered in the registry, and look at the `name` attribute of the `type:Record` schema – OneCricketeer Jun 06 '22 at 13:57
  • Transform can't be dynamic. That's why I said you'll have to modify the include list and use different connectors for different table types, so you'll have both "aws-db-connector-int" and "aws-db-connector-binary" running – OneCricketeer Jun 06 '22 at 14:00
  • The name is simply `Key` for key schemas, and `Value` for value schemas. What changes dynamically is the subject. In value schemas RecordName matches with the old topic name, which includes the original table name. In key schemas RecordName has the same value of the new topic name, and it doesn't include the original table name. I would've expected a 1:1 match in naming conventions for key and value subjects, but it doesn't seem to be the case. – Vektor88 Jun 06 '22 at 14:17
  • If your table names matched some pattern then the include list could use a regex... But other option - does the reroute transform affect the subject name? If so, should `key.field.replacement` include more of the capturing groups – OneCricketeer Jun 06 '22 at 14:17
  • I think you are confusing the purpose of those parameters. The `key.field.*` parameters are needed to add one extra field inside the Key, named `table`, that should contain the table name (group 3 in the regex). Those parameters are not related with how the subject of the key schema is named. You can see more detail on the Debezium documentation https://access.redhat.com/documentation/fr-fr/red_hat_integration/2020-q2/html/debezium_user_guide/configuring-debezium-connectors-for-your-application#ensuring-unique-keys-across-records-routed-to-the-same-topic – Vektor88 Jun 06 '22 at 14:20
  • @OneCricketeer I think I've solved my issue, I left an answer below. – Vektor88 Jun 07 '22 at 08:37

1 Answers1

1

This appears to be how Debezium works by default. It will create only one key schema per topic, but different value schemas, so all the messages rerouted to the topic should share the same key structure.

To solve the issue, RegexRouter should be used instead. Applying a InsertField transformation before rerouting, will also add the original topic name to the key, and it will be possible to extract the table name from it.

"transforms": "InsertField,Reroute",
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Key",
"transforms.InsertField.topic.field": "table"
"transforms.Reroute.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.Reroute.regex": "(.*\\S)\\.(.*\\S)\\.(.*\\S)",
"transforms.Reroute.replacement": "$2_schema",
Vektor88
  • 4,841
  • 11
  • 59
  • 111