0

I am trying to find a way to deploy pyflink on k8s using the k8s operator. I have been able to upload a job already with the k8s Operator, but I can't find how to add connectors to it (Like kafka-connector.jar o kinesis-connector.jar). I couldn't find anymore documentation on how to use pyflink with k8s operators and I am not familiar with java. So it's a dead-end to me

I am basing myself on this repo which states to use a FlinKDeployment.yaml to deploy a demo pyflink which sinks to console (assuming there's a k8s cluster with the operator already running to apply it to). I have followed it and it works. But now I am just trying to figure out how to add the source/sink connectors .jar on it.

I've followed the documentation on how to use connectors in python, which worked well locally. But not on the k8s operator for some reason.

The steps I have followed is to add the connector flink-sql-connector-kinesis-1.16.2.jar to my Dockerfile image, and on the python file (also included on the image) I added the code to reference it, this works well locally:

env = StreamExecutionEnvironment.get_execution_environment()
env.set_parallelism(1)
t_env = StreamTableEnvironment.create(stream_execution_environment=env)
CURRENT_DIR = os.path.dirname(os.path.realpath(__file__))

t_env.get_config().get_configuration().set_string(
    "pipeline.jars",
    "file:///" + CURRENT_DIR + "/lib/flink-sql-connector-kinesis-1.16.2.jar",
)

And then I create a sink table using kinesis

    t_env.execute_sql(    """ CREATE TABLE print_table (<columns...>)
          WITH (
            'connector' = 'kinesis',
            'stream' = '<stream_name>',
            'aws.region' = '<aws_region>',
            'sink.partitioner-field-delimiter' = ';',
            'sink.batch.max-size' = '100',
            'format' = 'json',
            'json.timestamp-format.standard' = 'ISO-8601'
          )  """)

But when I make the FlinkDeployment based on that repo, I see it can't find Kinesis, probably because this is not the way to include the connector.jar in a pyflink job submitted with k8s operator that uses java rather than python I assume?? . The java error I get is Could not find any factory for identifier 'kinesis' that implements 'org.apache.flink.table.factories.DynamicTableFactory' in the classpath.

The way I am submitting my job is with this manifest based from that repo, and my python file inside the image is indeed found at /opt/flink/usrlib/python_demo.py :

apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: python-kinesis-smoke
spec:
  image: <docker_hub_repo>/pyflink_kinesis:latest
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "1"
  serviceAccount: flink
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/opt/flink-python_2.12-1.16.1.jar # Note, this jarURI is actually a placeholder
    entryClass: "org.apache.flink.client.python.PythonDriver"
    args: ["-pyclientexec", "/usr/local/bin/python3", "-py", "/opt/flink/usrlib/python_demo.py"]
    parallelism: 1
    upgradeMode: stateless

I have already read all the Flink K8s Operator documentation and I couldn't find any mention to pyflink, only submitting jobs already packed entirely on .jar files which is not my use case since I am on pyflink. I also found this other repo about using a manifest of kind FlinkCluster but I couldn't make it work as the k8s cluster says a Kind FlinkCluster does not exists.

Has anyone some pointers as to how to deploy pyflink using connectors on k8s operator on Application or Session mode? I believe my only option would be to submit with the CLI which I would like to avoid in favor of using the k8s operator if possible

0 Answers0