1

When reading about Kafka and how to get data from Kafka to a queryable database suited for some specific task, there is usually mention of Kafka Connect sinks. enter image description here This sounds like the way to go if I needed Kafka to search indexing like ElasticSearch or analytics like Hadoop to Spark where there's a Kafka Connect sink available.

But my question is what is the best way to handle a store that isn't as popular say MyImaginaryDB, where the only way I can get to it is through some API, and the data needs to be handled securely and reliably, as well as decently transformed before inserting? Is it recommended to:

  1. Just have the API consume from Kafka and use the MyImaginaryDB driver to write
  2. Figure out how to build a custom Kafka Connect sink (assuming it can handle schemas, authentication/authorization, retries, fault-tolerance, transforms and post-processing needed before landing in MyImaginaryDB)

I have also been reading about Kafka KSQL and Streams and am wondering if that helps with transforming the data before it is sent to the end store.

atkayla
  • 8,143
  • 17
  • 72
  • 132
  • You may get a better answer if you tell us what database intend to use. Otherwise it might get closed as "Unclear what you are asking" or "too broad". – Thilo Jun 23 '19 at 08:19
  • It depends on your general situation. If you use Connect in several other places already, it keeps your system consistent to do this with Connect as well. If you have no experience with it, you'd need to make the strategic decision whether you want to start. Just writing a simple consumer definitely is the most simple solution. – daniu Jun 23 '19 at 08:29

1 Answers1

1

Option 2, definitely. Just because there isn't an existing source connector, doesn't mean Kafka Connect isn't for you. If you're going to be writing some code anyway, it still makes sense to hook into the Kafka Connect framework. Kafka Connect handles all the common stuff (schemas, serialisation, restarts, offset tracking, scale out, parallelism etc etc), and leaves you just to implement the bit of getting the data to MyImaginaryDB.

As regards transformations, standard pattern is either:

  • Use Single Message Transform for lightweight stuff
  • Use Kafka Streams/KSQL and write back to another topic, which is then routed through Kafka Connect to the target

If you try to build your own app doing (transformation + data sink) then you're munging together responsibilities, and you're reinventing a chunk of wheel that exists already (integration with an external system in a reliable scalable way)

You might find this talk useful for background about what Kafka Connect can do: http://rmoff.dev/ksldn19-kafka-connect

Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92
  • 1
    Wow, you’re the guy whose face is on all the Confluent articles! Thanks, I really enjoyed the talk and learned a lot! For my specific use case, I plan to integrate with blockchains (no premade sinks available yet :P). Usually you would write a REST/GraphQL API that passes through some JSON/string/buffer to a chaincode function via the blockchain’s driver/SDK library, so it seems like that API part can be abstracted into a Kafka Connect sink as long as there’s a Java driver for that particular blockchain. Time to learn Java/Kafka/Kafka Connect! :) – atkayla Jun 23 '19 at 19:26