1

I am migrating to kafka as broker and debezium to get data (ETL data) from all of the micro-services to reporting and search databases.

Is there any way to configure debezium so that it puts data on separate topics based on custom criteria (like users or company or on some key column/attribute of row/data).

Zeeshan Bilal
  • 1,147
  • 1
  • 8
  • 22

3 Answers3

1

Not sure if you are looking for Topic Routing

Assuming you cannot add a filter option to Debezium itself, the typical pattern is to use Kafka Streams, KSQL (or Flink based on your previous question), to filter and dispurse the data you're interested in out into different topics that downstreams consumers would need.

From a single Debezium configuration, though, you have to hardcode a namespace/collection/table. You would need multiple configurations for multiple of those.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • I am facing the same issue as raised in question. @cricket_007 I don't want to use typical pattern as u specified in above answer instead I want to set some configuration properties in source for Debezium that pull only such data that matches with my config. As you are Assuming that we cannot add a filter option to Debezium itself. is it just assumption or Debezium really does not support it. – Shahid Ghafoor Jan 24 '19 at 11:23
  • @Shahid I have not found such an option in the Debezium documentation, as compared to the JDBC Source Connector which supports adding arbitrary `WHERE` conditions – OneCricketeer Jan 24 '19 at 15:49
0

I'd suggest to implement a custom SMT (single message transform) which routes the records produced by the Debezium connector into the right topics. You can take Debezium's routing SMT linked in the answer by cricket_007 as an example for your custom implementation. Having the SourceRecord available, you can decide about the destination topic based on all the captured table's column values.

Kafka Streams or similar would work, too, but I'd recommend to first look into SMTs due to the ease of operating (no separate process needed) and only look for alternatives if SMTs are not sufficient.

Gunnar
  • 18,095
  • 1
  • 53
  • 73
  • To clarfify this answer - you would have to inspect the record values, then override the `topic` parameter in the new record of the Transformation `apply` method – OneCricketeer Jan 25 '19 at 22:23
  • @Gunner i get a response from @ YevaByzek : I don’t believe this is possible today. There’s an open AK jira: https://issues.apache.org/jira/browse/KAFKA-5869 However, you can use Kafka Streams API to do dynamic routing. – Zeeshan Bilal Jan 28 '19 at 08:11
  • I was planing to write a custom SMT slimier to org.apache.kafka.connect.transforms.RegexRouter and allow field name in replacement. Can we do this or there is any architectural limitation that is currently blocking me to do such thing. – Zeeshan Bilal Jan 28 '19 at 08:11
  • KAFKA-5869 discusses a ready-made SMT for that purpose, but nothing stops you from implementing a custom SMT for that purpose as of today. – Gunnar Jan 28 '19 at 08:20
0

Hi @Gunnar for the similar requirement is it possible to send the message from Debezium to different topics (single/multiple) based on a condition. Eg: for Table A event to topic A, for Table B event to topic B1 , topic B2 for Table C event to topic C1 , C2 and topic B1 etc In Source Connector or Regex Connector there is only option of setting one topic name in the class org.apache.kafka.connect.connector.ConnectRecord. Is there a way to set multiple topics. As explained send one event to different topics based on some business logic.

RSA
  • 51
  • 3
  • I think you'd be best off by implementing a Kafka Streams application which does this routing based on the single topic used for each table by Debezium. Kafka Connect cannot write to multiple topics at once. – Gunnar Sep 27 '19 at 09:24
  • For that you have to create multiple connectors, one connector for a table by white listing the table and use your desired topic. – Zeeshan Bilal Sep 27 '19 at 10:13