How to deal with tombstones and Struct type keys when using Debezium and KSQLDB in Confluent Cloud

Question

I have created a Debezium Kafka connector using KSQLDB.

Every time a row is removed in a table, Debezium send a tombstone like this (f.ex):

KEY: Struct(cliente_cod=0000) | BODY: null

When I materialize a row in a table (with KSQLDB), I have the following columns (f.ex):

ID: 0000 | NAME: xxxx | SURNAME: xxxx

Without any transformation the id in the tombstone Struct(cliente_cod=0000) and the id in table 0000 wont match so the row wont be removed. Obviously we can just store the Struct(cliente...) as the id of the table but could be problematic if you need to make joins with other tables.

By doing a rekey via streams (rekeying with PARTITION BY, f.ex) tombstones will be ignored because null is not a valid content (streams doesn't know anything about tombstones; it is just a concept for materialized views).

A good solution could be to add transformations (here an example with the previous case -in KSQLDB connector definition-):

"transforms.extractClienteKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractClienteKey.field" = 'cliente_cod',
"transforms.extractClienteKey.predicate" = 'IsClienteTopic',

That's fine and it works; tombstones will be transformed into (No Struct):

KEY: 0000 | BODY: null

When your DB has a lot of tables with different primary key names; let's say that you have 30 tables with PK names such as client_id, user_id, etc. In this case, in order to use ExtractField$Key you need to discriminate by topic and apply a different transformation for each topic.

That works too, the problem is when you try to run more than 10 transforms by connector in Confluent Cloud (the service is limited to 10).

And here my questions:

Is there a way to configure Debezium (or any kafka-connect) connector to send 0000 instead Struct(id=0000) without applying transforms?
What is the proper way to deal with Debezium tombstones and KSQLDB tables? Is transformation the only way? any alternative?

score 0 · Answer 1 · answered Oct 27 '22 at 13:29

0

After some research seems like currently there isn't any viable alternative to the transformers to do that at connector level. If you have the same problem in Confluent Cloud, probably the best thing that you can do is contact with the Confluent support team in order to try to increase the transform limit.

answered Oct 27 '22 at 13:29

José Vte. Calderón

1,348
12
17

Where did you find that the limit is 10? The Kafka Connect source shouldn't have any such limitation – OneCricketeer Dec 04 '22 at 06:14
That limitation is in Confluent Cloud, not in Kafka Connect. – José Vte. Calderón Dec 04 '22 at 11:04
But is that publicly documented? The workaround would be running UDFs in ksql or writing client code to do all necessary actions, writing to a new topic which Connect reads – OneCricketeer Dec 04 '22 at 13:18
Didn't find this limit in Confluent documentation; I noticed that by using their fully managed cloud solution. Maybe that limit will be changed (I don't know) but they don't have any problem to modify that server params if needed (by ticket). – José Vte. Calderón Dec 05 '22 at 07:50
So, it just silently fails and ignored tenth+ transforms? – OneCricketeer Dec 05 '22 at 15:32
1

Not silently. During the creation of the connector (I did it from the KSQLDB console), it fails and launch an error message with that information (limit of 10 transforms). You just can't create the connector. – José Vte. Calderón Dec 12 '22 at 10:21

How to deal with tombstones and Struct type keys when using Debezium and KSQLDB in Confluent Cloud

1 Answers1