0

In my problem I need to query a database and join the query results with a Kafka data stream in Flink. Currently this is done by storing the query results in a file and then use Flink's readFile functionality to create a DataStream of query results. What could be a better approach to bypass the intermediary step of writing to file and create a DataStream directly from query results?

My current understanding is that I would need to write a custom SourceFunction as suggested here. Is this the right and only way or are there any alternatives?

Are there any good resources for writing the custom SoruceFunctions or should I just look at current implementations for reference and customise them fro my needs?

Amuoeba
  • 624
  • 9
  • 28

1 Answers1

2

One straightforward solution would be to use a lookup join, perhaps with caching enabled.

Other possible solutions include kafka connect, or using something like Debezium to mirror the database table into Flink. Here's an example: https://github.com/ververica/flink-sql-CDC.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • In addition to David's suggestion, which worked for me. I implemented something similar and indeed, I have used a CDC Source, which basically provides you a DataStream representing your rows (and changes over time) so after that you can create a stateful function that you can use to join the Kafka records with the CDC records. The only thing you need is to be able to key those by the same identifiers. – Dan Serb Apr 01 '22 at 08:00