2

I have a setup of Debezium connector (running on ksqlDB-server) that's streaming values from SQL Server CDC Tables to Kafka Topics. I'm trying to transform the key of my message from JSON to Integer value. The example key I'm receiving looks like this: {"InternalID":11117} and I want to represent it as just a number 11117. According to Kafka Connect documentation this should be fairly easy with ExtractField SMT. However when I'm configuring my connector to use this transform I'm receiving an error Caused by: java.lang.IllegalArgumentException: Unknown field: InternalID.

Connector config:

CREATE SOURCE CONNECTOR properties_sql_connector WITH (
'connector.class'= 'io.debezium.connector.sqlserver.SqlServerConnector', 
'database.hostname'= 'propertiessql', 
'database.port'= '1433', 
'database.user'= 'XXX', 
'database.password'= 'XXX', 
'database.dbname'= 'Properties', 
'database.server.name'= 'properties', 
'table.exclude.list'= 'dbo.__EFMigrationsHistory', 
'database.history.kafka.bootstrap.servers'= 'kafka:9091', 
'database.history.kafka.topic'= 'dbhistory.properties',
'key.converter.schemas.enable'= 'false',
'transforms'= 'unwrap,extractField',
'transforms.unwrap.type'= 'io.debezium.transforms.ExtractNewRecordState',
'transforms.unwrap.delete.handling.mode'= 'none',
'transforms.extractField.type'= 'org.apache.kafka.connect.transforms.ExtractField$Key',
'transforms.extractField.field'= 'InternalID',
'key.converter'= 'org.apache.kafka.connect.json.JsonConverter');

Error details:

--------------------------------------------------------------------------------------------------------------------------------------
 0       | FAILED | org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:223) 
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:149)
        at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50)
        at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:355)
        at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:258)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Unknown field: InternalID
        at org.apache.kafka.connect.transforms.ExtractField.apply(ExtractField.java:65)
        at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:173)       
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:207) 
        ... 11 more

Any ideas for why this transform is failing? Am I missing some configuration? When extractField transform is removed the key of my message looks like the above:{"InternalID":11117}

msz
  • 21
  • 1

2 Answers2

1

By default, when you configure SMTs for any connector, including Debezium, transformation is applied to every record that the connector emits. That includes change event messages that might not have the retrieved data and might not have the necessary fields. To solve this, you need to apply your SMTs selectively to only a specific subset of change event messages generated by Debezium using SMT predicates.

The official documentation is located here.

In your specific case, you could apply the SMT only to the output topic for that specific database table it would look something like this:

# Create a predicate that matches your output 
predicates: topicNameMatch
predicates.topicNameMatch.type: org.apache.kafka.connect.transforms.predicates.TopicNameMatches
predicates.topicNameMatch.pattern: *output topic name goes here*

# Your logic to extract the field from the key
transforms.extractField.type: org.apache.kafka.connect.transforms.ExtractField$Key
transforms.extractField.field: InternalID

# This references the predicate above
transforms.extractField.predicate: topicNameMatch

There are other predicates located in the documentation listed above if the topic name matching doesn't work for you.

dynamitem
  • 1,647
  • 6
  • 25
  • 45
0

In order to extract a named field from JSON, you'll need schemas.enable = 'true' for that converter

For any data that's not sourced from Debezium, that'll require the JSON has a schema as part of the event.

Or, if you're using the Schema Registry, switch to a different converter that uses that, and it should work.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Thanks for the answer @OneCricketeer, I've tried enabling schemas for both my values and keys with `'key.converter.schemas.enable'= 'true',` and `'value.converter.schemas.enable'= 'true',` but it didn't help - same issue. I've also tried with Avro serialization and schema registry for my key (Value was already serialized to Avro previously) and it's still throwing the same exception. – msz Jun 23 '22 at 08:47
  • I would need to do more testing since I don't have a readily available SQLServer database, but it should work. However, I didn't think the debezium connector was able to produce structured keys, anyway. I thought it was just the primary key of the table. What is the schema of your table? – OneCricketeer Jun 24 '22 at 21:59