1

Setup: I have CDC enabled on MS SQL Server and the CDC events are fed to Kafka using debezium kafka connect(source). Also more than one table CDC events are routed to a single topic in Kafka.

Question: Since I have more than one table data in the kafka topic, I would like to have the table name and the database name in the CDC data.

I am getting the table name and database name in MySQL CDC but not in MS SQL CDC.

Below is the Debezium source connector for the SQL Server

curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{
  "name": "cdc-user_profile-connector",
  "config": {
    "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
    "tasks.max": "1",
    "database.hostname": "<<hostname>>",
    "database.port": "<<port>>",
    "database.user": "<<username>>",
    "database.password": "<<password>>",
    "database.server.name": "test",
    "database.dbname": "testDb",
    "table.whitelist": "schema01.table1,schema01.table2",
    "database.history.kafka.bootstrap.servers": "broker:9092",
    "database.history.kafka.topic": "digital.user_profile.schema.audit",
    "database.history.store.only.monitored.tables.ddl": true,
    "include.schema.changes": false,
    "event.deserialization.failure.handling.mode": "fail",
    "snapshot.mode": "initial_schema_only",
    "snapshot.locking.mode": "none",
    "transforms":"addStaticField,topicRoute",
    "transforms.addStaticField.type":"org.apache.kafka.connect.transforms.InsertField$Value",
    "transforms.addStaticField.static.field":"source_system",
    "transforms.addStaticField.static.value":"source_system_1",
    "transforms.topicRoute.type":"org.apache.kafka.connect.transforms.RegexRouter",
    "transforms.topicRoute.regex":"(.*)",
    "transforms.topicRoute.replacement":"digital.user_profile",
    "errors.tolerance": "none",
    "errors.log.enable": true,
    "errors.log.include.messages": true,
    "errors.retry.delay.max.ms": 60000,
    "errors.retry.timeout": 300000
  }
}'

I am getting the below output (Demo data)

{
  "before": {
    "profile_id": 147,
    "email_address": "test@gmail.com"
  },
  "after": {
    "profile_id": 147,
    "email_address": "test_modified@gmail.com"
  },
  "source": {
    "version": "0.9.4.Final",
    "connector": "sqlserver",
    "name": "test",
    "ts_ms": 1556723528917,
    "change_lsn": "0007cbe5:0000b98c:0002",
    "commit_lsn": "0007cbe5:0000b98c:0003",
    "snapshot": false
  },
  "op": "u",
  "ts_ms": 1556748731417,
  "source_system": "source_system_1"
}

My requirement is to get as below

{
  "before": {
    "profile_id": 147,
    "email_address": "test@gmail.com"
  },
  "after": {
    "profile_id": 147,
    "email_address": "test_modified@gmail.com"
  },
  "source": {
    "version": "0.9.4.Final",
    "connector": "sqlserver",
    "name": "test",
    "ts_ms": 1556723528917,
    "change_lsn": "0007cbe5:0000b98c:0002",
    "commit_lsn": "0007cbe5:0000b98c:0003",
    "snapshot": false,
    "db": "testDb",
    "table": "table1/table2"
  },
  "op": "u",
  "ts_ms": 1556748731417,
  "source_system": "source_system_1"
}
RobC
  • 22,977
  • 20
  • 73
  • 80
Philip
  • 13
  • 4

2 Answers2

0

Debezium Kafka-Connect generally puts data from each table in a separate topic and the topic name is of the format hostname.database.table. We generally use the topic name to distinguish between the source table & database name.

If you are putting the data from all the tables manually into one topic then you might have to add the table and database name manually as well.

Aman Saurav
  • 751
  • 9
  • 28
  • How would i add it manually in CDC event ?.The reason for adding more than one table data to kafka topic is because this is an enterprise datapipeline and there might be 100s of tables. If we keep increasing the topics like that then performance of the kafka cluster will deteriorate. Also kafka connect provides this feature via setting these two properties CONNECT_KEY_CONVERTER_KEY_SUBJECT_NAME_STRATEGY: "io.confluent.kafka.serializers.subject.TopicRecordNameStrategy" CONNECT_VALUE_CONVERTER_VALUE_SUBJECT_NAME_STRATEGY: "io.confluent.kafka.serializers.subject.TopicRecordNameStrategy". – Philip May 02 '19 at 15:37
0

This is planned as a part of https://issues.jboss.org/browse/DBZ-875 issue

Jiri Pechanec
  • 1,816
  • 7
  • 8