0

I have this scenario where I need to fetch millions of records from an Oracle database and then need to send these records in a chunk of 1000 to an Apache Kafka producer.

While fetching the records the next time, I have to avoid pulling the already pushed records to Kafka, and select the updated records instead. It's a form of delta load processing,

Please let me know if there is any approach for this scenario that I should follow.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197

1 Answers1

1

Use CDC to stream changes from a database such as Oracle into Kafka. You have a variety of options, including GoldenGate, DBVisit, Attunity, and more.

Alternatively use the JDBC Kafka Connect connector to stream records into Kafka, based on changes to an incrementing key or timestamp. This is not as scalable or flexible a solution as CDC - but has the advantages of being free :)

Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92
  • Yeah this is one of the approaches. But I am looking for something completely in java code. I am trying to avoid 3rd party tools or connectors. For this what approach should I go for? – user8873557 Nov 02 '17 at 09:28
  • 1
    > I am looking for something completely in java code. I am trying to avoid 3rd party tools or connectors Why? You're talking about rewriting something that exists already, in the form of Kafka Connect (or, CDC if you want to go directly to the transaction log) – Robin Moffatt Nov 02 '17 at 10:45
  • Yes, because we have some constraints in terms of using third party tools. Any suggestions please? how to fetch/split records before sending to producer. – user8873557 Nov 02 '17 at 10:47
  • So no third party tools? or just no commercial tools? – Robin Moffatt Nov 02 '17 at 10:51
  • actually we are trying to build a new tool specific to our requirements. So we have instructions of a strict no to any kind of tools :) ... – user8873557 Nov 02 '17 at 10:53
  • 1
    are you allowed to use computers? ;-) Sounds like a rather restrictive and pointless requirement TBH. Kafka Connect is an API, if you're using Kafka, absolutely no reason why you wouldn't use one of its APIs. – Robin Moffatt Nov 02 '17 at 10:57