0

I'm trying to create a table whit flink's table API that uses a Debezium source function, I found an implementation of these functions here https://github.com/ververica/flink-cdc-connectors, and used them like this on my code:

val debeziumProperties = new Properties()
  debeziumProperties.setProperty("plugin.name", "wal2json")
  debeziumProperties.setProperty("format", "debezium-json")

  val sourceFunction: DebeziumSourceFunction[TestCharge] = PostgreSQLSource.builder()
    .hostname("******")
    .port(5432)
    .database("*****") // monitor all tables under inventory database
    .username("*****")
    .password("*****")
    .debeziumProperties(debeziumProperties)
    .deserializer(new CustomDebeziumDeserializer) // converts SourceRecord to String
    .build()

  val env = StreamExecutionEnvironment.getExecutionEnvironment
  val sSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build()
  val sTableEnv = StreamTableEnvironment.create(env, sSettings)

  val cdcStream: DataStream[TestCharge] = env
    .addSource(sourceFunction)
    .map(x => x)

  sTableEnv.createTemporaryView("historic", cdcStream, 'chargeId, 'email, 'amount, 'cardHash)
  val table: Table = sTableEnv.sqlQuery("SELECT SUM(amount) FROM historic GROUP BY chargeId")

  val reverse = sTableEnv.toRetractStream[Row](table)

  reverse.print()

I also added this dependency as described on documentation:

"com.alibaba.ververica" % "flink-sql-connector-postgres-cdc" % "1.1.0"

When I try to run my job locally on mini-cluster it works fine, but in a Flink cluster that is provisioned on Kubernetes it gives me this exception:

Caused by: io.debezium.DebeziumException: No implementation of Debezium engine builder was found

Has anyone know what could be happening or if I'm missing some dependency?

Thanks in advance.

whiteskull
  • 67
  • 9

1 Answers1

1

if you want to use it in TableAPI/SQL, you can just register the table using SQL DDL.

sTableEnv.executeSql(
      """
        |CREATE TABLE shipments (
        |  shipment_id INT,
        |  order_id INT,
        |  origin STRING,
        |  destination STRING,
        |  is_arrived BOOLEAN
        |) WITH (
        |  'connector' = 'postgres-cdc',
        |  'hostname' = 'localhost',
        |  'port' = '5432',
        |  'username' = 'postgres',
        |  'password' = 'postgres',
        |  'database-name' = 'postgres',
        |  'schema-name' = 'public',
        |  'table-name' = 'shipments'
        |)
        |""".stripMargin)
// then you can query the table
  val table: Table = sTableEnv.sqlQuery("SELECT SUM(shipment_id) FROM shipments GROUP BY order_id")

It's the easiest way to work with CDC source. Because currently, Table API doesn't support to convert a changelog stream into a Table.

Regarding to your problem, I think this might because of dependency conflicts. Please check whether you are depending on another version of <artifactId>debezium-embedded</artifactId>. If yes, please remove it. flink-sql-connector-postgres-cdc already packages it with version 1.12.

Jark Wu
  • 176
  • 2
  • Can you please provide reference for your above statement "Table API doesn't support to convert a change log stream into a Table" – python_enthusiast Jan 26 '21 at 21:58
  • @python_enthusiast this is a new feature under development, hopefully this will be released in the next 1.13 version. See https://cwiki.apache.org/confluence/display/FLINK/FLIP-136%3A++Improve+interoperability+between+DataStream+and+Table+API – Jark Wu Jan 28 '21 at 02:27
  • @JarkWu - I need to create a data stream from postgres datasource, and my resultset is join of three tables. Can you share an example of such scenario ? – Swapnil Khante May 11 '21 at 07:19