1

I am using flink latest (1.11.2) to work with a sample mysql database, which the database is working fine.

Additionally, i have added the flink-connector-jdbc_2.11-1.11.2, mysql-connector-java-8.0.21.jar, postgresql-42.2.17.jar to the {FLINK}/lib

Here is my code

T_CONFIG = TableConfig()
B_EXEC_ENV = ExecutionEnvironment.get_execution_environment()
B_EXEC_ENV.set_parallelism(1)
BT_ENV = BatchTableEnvironment.create(B_EXEC_ENV, T_CONFIG)

ddl = """
            CREATE TABLE nba_player4 (
                 first_name STRING ,
                 last_name STRING,
                 email STRING,
                 id INT
            ) WITH (
                'connector' = 'jdbc',
                'url' = 'jdbc:mysql://localhost:3306/inventory',
                'username' = 'root',
                'password' = 'debezium',
                'table-name' = 'customers'
            )
      """;
BT_ENV.sql_update(ddl)

sinkddl = """
        CREATE TABLE print_table (
         f0 INT,
         f1 INT,
         f2 STRING,
         f3 DOUBLE
        ) WITH (
         'connector' = 'print'
        )
      """;
BT_ENV.sql_update(sinkddl)


sqlquery("SELECT first_name, last_name  FROM nba_player4 ");
BT_ENV.execute("table_job")

However when running the code, it come up with error saying

py4j.protocol.Py4JJavaError: An error occurred while calling o23.sqlQuery.
: org.apache.flink.table.api.ValidationException: SQL validation failed. findAndCreateTableSource failed.

Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.TableSourceFactory' in
the classpath.

Reason: Required context properties mismatch.

The following properties are requested:
connector=jdbc
password=debezium
schema.0.data-type=VARCHAR(2147483647)
schema.0.name=first_name
schema.1.data-type=VARCHAR(2147483647)
schema.1.name=last_name
schema.2.data-type=VARCHAR(2147483647)
schema.2.name=email
schema.3.data-type=INT
schema.3.name=id
table-name=customers
url=jdbc:mysql://localhost:3306/inventory
username=root

The following factories have been considered:
org.apache.flink.connector.jdbc.table.JdbcTableSourceSinkFactory
org.apache.flink.table.sources.CsvBatchTableSourceFactory
org.apache.flink.table.sources.CsvAppendTableSourceFactory
org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory
org.apache.flink.table.filesystem.FileSystemTableFactory

latest:

this is my docker yml file.

version: '2.1'
services:
  jobmanager:
    build: .
    image: flink:latest
    hostname: "jobmanager"
    expose:
      - "6123"
    ports:
      - "8081:8081"
    command: jobmanager
    environment:
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
  taskmanager:
    image: flink:latest
    expose:
      - "6121"
      - "6122"
    depends_on:
      - jobmanager
    command: taskmanager
    links:
      - jobmanager:jobmanager
    environment:
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
  mysql:
    image: debezium/example-mysql
    ports:
     - "3306:3306"
    environment:
     - MYSQL_ROOT_PASSWORD=debezium
     - MYSQL_USER=mysqluser
     - MYSQL_PASSWORD=mysqlpw 

docker ps commands show out

CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS              PORTS                                                            NAMES
cf84c84f7821        flink      "/docker-entrypoint.…"   2 minutes ago       Up 2 minutes        6121-6123/tcp, 8081/tcp                                          _taskmanager_1
09b19142d70a        flink      "/docker-entrypoint.…"   9 minutes ago       Up 9 minutes        6123/tcp, 0.0.0.0:8081->8081/tcp                                 _jobmanager_1
4ac01eb11bf7        debezium/example-mysql      "docker-entrypoint.s…"   3 days ago          Up 9 minutes        0.0.0.0:3306->3306/tcp, 33060/tcp                                keras-flask-dep

more info:

my current flink environment in docker is flink:scala_2.12-java8

docker pull flink:scala_2.12-java8

pyflink jdbc connector is flink-connector-jdbc_2.11-1.11.2.jar from flink 1.11 version.

https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/jdbc.html

in order to use the jdbc library, I tried two ways

  1. save the flink-connector-jdbc_2.11-1.11.2.jar into /usr/local/lib/python3.7/site-packages/flink/lib

  2. configure the classpath in the python app

     base_dir = "/Users/huhu/Documents/projects/webapp/libs/"
     flink_jdbc_jar = f"file://{base_dir}flink-connector-jdbc_2.11-1.11.2.jar"
    
    BT_ENV.get_config().get_configuration().set_string("pipeline.jars",jars)
    

but still getting the same error

user824624
  • 7,077
  • 27
  • 106
  • 183
  • Have you tried adding the `flink-connector-jdbc_2.11-1.11.2.jar` into the flink classpath - `flink/lib` ? – Mikalai Lushchytski Oct 15 '20 at 13:41
  • @MikalaiLushchytski i had tried adding it to /usr/local/lib/python3.7/site-packages/flink/lib, do you mean this? – user824624 Oct 15 '20 at 14:41
  • i mean something you asked in a separate thread - https://stackoverflow.com/questions/64303382/how-to-configure-some-external-jars-library-to-the-flink-docker-container. Putting the required dependencies to flink classpath, i.e. `flink/lib` inside the container (jobmanager and taskmanager). – Mikalai Lushchytski Oct 15 '20 at 14:50
  • I see what you mean, just tried it a minute ago running the flink docker container with volume attached to /flink/lib in my system. The docker container was running well, but the code still running with the same error. – user824624 Oct 16 '20 at 00:06
  • I have downloaded the flink 1.11.2 version from https://ci.apache.org/projects/flink/flink-docs-release-1.11/try-flink/local_installation.html, and add the flink-connector-jdbc_2.11-1.11.2.jar to flink/lib, running the local flink cluster, but still getting the same error, so wired. – user824624 Oct 16 '20 at 00:49
  • is this the incompatibility issue between python3 flink library and flink-connector-jdbc_2.11-1.11.2.jar – user824624 Oct 16 '20 at 01:28

2 Answers2

0

This might not fully answer the question, but: from MySQL perspective, your CREATE TABLE statement is not valid SQL, and would raise a syntax error. The reason why is that the VARCHAR datatype requires a length (that is the maximum number of characters that the column can hold).

For example:

CREATE TABLE nba_player4 (
    first_name VARCHAR(20),
    last_name  VARCHAR(20),
    email      VARCHAR(50),
    id         VARCHAR(10)
);

Now this is valid MySQL code. Still, I would furthermore recommend defining a primary key in the table. This is a good practice in database design for many reasons, one of them being the ability to uniquely identify each record: this makes it possible to accurately select a given record with a WHERE clause, or to build foreign keys contraints referencing the table. A column called id might be a good candidate for that - and would probably be better defined as an auto-incremented integer.

So, mabye:

CREATE TABLE nba_player4 (
    first_name VARCHAR(20),
    last_name  VARCHAR(20),
    email      VARCHAR(50),
    id         INT PRIMARY KEY AUTO_INCREMENT
);
GMB
  • 216,147
  • 25
  • 84
  • 135
  • it is my mistake that id should be int instead of string, however that is not the key to this problem, thanks anyway – user824624 Oct 11 '20 at 19:52
0

Can you verify all components versions that you use. Most probably you are not using 1.9 version of Flink, as I see it produces a new format of data type properties which was introduced in later versions.

In Flink 1.9 (that is the case at least in 1.9.3, which I checked) the properties should have format: schema.#.type, while in your case you have schema.#.data-type.

I'd recommend either upgrading to the newest Flink version or at least making sure you use all the components with the same version.

Dawid Wysakowicz
  • 3,402
  • 17
  • 33
  • as you said I was using the flink latest version, so I tried to pull flink:1.9.3-scala_2.12 version container, and started the jobmanager and taskmanager. but the code is still posting the same error. very wired. – user824624 Oct 13 '20 at 10:49
  • as you said, it might be not the issue of container but the pyflink library, i tried the pyflink library in 1.9.3 version, that seems to be working. An an api issue happens, when the code executes "BT_ENV.sql_query( sql ).execute()" , it tells me AttributeError: 'Table' object has no attribute 'execute' – user824624 Oct 13 '20 at 11:40
  • Unfortunately I am not familiar with the python library. You cannot mix different component versions. You must use the same version of all the components. Moreover I suggest upgrading to 1.11.2. – Dawid Wysakowicz Oct 13 '20 at 11:55
  • BTW, I think the `Table#execute` method was added in 1.11. – Dawid Wysakowicz Oct 13 '20 at 11:57
  • I has experimented again, if I upgraded the python library to 1.11.2, then the schema.#.data-type will come again – user824624 Oct 13 '20 at 14:04