1

In order to enrich the data stream, we are planning to connect the MySQL (MemSQL) server to our existing flink streaming application

As we can see that Flink provides a Table API with JDBC connector https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/

Additionally, I discovered another MySQL connector called Flink-CDC https://ververica.github.io/flink-cdc-connectors/master/content/about.html allowing to work with external database in a stream fashion

what is the difference between them? what is better to choose in my case?

1 Answers1

2

Change Data Capture (CDC) connectors capture all changes that are happening in one or more tables. The schema usually has a before and an after record. The Flink CDC connectors can be used directly in Flink in an unbounded mode (streaming), without the need for something like Kafka in the middle.

The normal JDBC connector can used in bounded mode and as a lookup table.

If you're looking to enrich you existing stream, you most likely want to use the lookup functionality. That allows you to query a table for a specific key (coming from your stream) and enrich the stream with data from your table. Keep in mind that from a performance perspective you're best off to use a temporal table join. See the example in https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/#how-to-create-a-jdbc-table

Martijn Visser
  • 1,468
  • 1
  • 3
  • 9
  • so, as an option, I can add a Table JDBC connector (for table containing enrichment data), convert my existing datastream into TableAPI and then join this two tables ? and after that convert back into classic DataStream to apply my process functions ? I planned to use flink-cdc -> convert into datastream -> union with existing datastream and do enrichment via flink state. – Sergey Postument Feb 08 '22 at 09:19
  • is temporal join `https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#event-time-temporal-join` will replace all the things with managing state in the regular datastream via single SQL statement? – Sergey Postument Feb 08 '22 at 09:41
  • flink doc says that for enrichment process it is good to use lookup join - `https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join` – Sergey Postument Feb 08 '22 at 10:16
  • Lookup joins incur considerable latency (though caching is supported). CDC-based joins require keeping the data in Flink state. Which works best is use-case dependent. – David Anderson Feb 08 '22 at 10:22