I have created a Debezium Kafka connector using KSQLDB.
Every time a row is removed in a table, Debezium send a tombstone like this (f.ex):
KEY: Struct(cliente_cod=0000) | BODY: null
When I materialize a row in a table (with KSQLDB), I have the following columns (f.ex):
ID: 0000 | NAME: xxxx | SURNAME: xxxx
Without any transformation the id in the tombstone Struct(cliente_cod=0000)
and the id in table 0000
wont match so the row wont be removed. Obviously we can just store the Struct(cliente...)
as the id of the table but could be problematic if you need to make joins with other tables.
By doing a rekey via streams (rekeying with PARTITION BY
, f.ex) tombstones will be ignored because null
is not a valid content (streams doesn't know anything about tombstones; it is just a concept for materialized views).
A good solution could be to add transformations (here an example with the previous case -in KSQLDB connector definition-):
"transforms.extractClienteKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractClienteKey.field" = 'cliente_cod',
"transforms.extractClienteKey.predicate" = 'IsClienteTopic',
That's fine and it works; tombstones will be transformed into (No Struct
):
KEY: 0000 | BODY: null
When your DB has a lot of tables with different primary key names; let's say that you have 30 tables with PK names such as client_id
, user_id
, etc. In this case, in order to use ExtractField$Key
you need to discriminate by topic and apply a different transformation for each topic.
That works too, the problem is when you try to run more than 10 transforms by connector in Confluent Cloud (the service is limited to 10).
And here my questions:
- Is there a way to configure Debezium (or any kafka-connect) connector to send
0000
insteadStruct(id=0000)
without applying transforms? - What is the proper way to deal with Debezium tombstones and KSQLDB tables? Is transformation the only way? any alternative?