1

This is apache flink example with pyflink. Link

I want to read records from the kafka topic and print them with pyflink. When I want to produce to kafka topic it works, but I can't read from that topic.

Error when consuming from kafka topic:

raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o0.execute.
: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
.
.
.
Caused by: java.lang.RuntimeException: Failed to create stage bundle factory! INFO:root:Initializing Python harness

My kafka cluster is running.

I checked out that with pyflink-1.17.0 and pyflink-1.15.4.

Mohammadreza Khedri
  • 2,523
  • 1
  • 11
  • 22

1 Answers1

0

Here is the complete working code, update the schema according to your needs:

import logging
import sys

from pyflink.common.time import Instant

from pyflink.common import Types
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import (DataTypes, TableDescriptor, Schema, StreamTableEnvironment)
from pyflink.table.expressions import lit, col
from pyflink.table.window import Tumble
from pyflink.common.time import Instant
from pyflink.table import StreamTableEnvironment, EnvironmentSettings, Schema
from pyflink.table import (DataTypes, TableDescriptor, Schema, StreamTableEnvironment)
from pyflink.datastream.connectors.kafka import FlinkKafkaProducer, FlinkKafkaConsumer
from pyflink.datastream.formats.json import JsonRowSerializationSchema, JsonRowDeserializationSchema

env = StreamExecutionEnvironment.get_execution_environment()
execution_config = env.get_config()
    
settings = EnvironmentSettings.new_instance()\
                      .in_streaming_mode()\
                      .build()
    
env.add_jars("file:///home/path-to-kafka-connector/flink-sql-connector-kafka-1.17.0.jar")
env.set_parallelism(2)
    # create table environment
tbl_env = StreamTableEnvironment.create(stream_execution_environment=env,
                                            environment_settings=settings)
deserialization_schema = JsonRowDeserializationSchema.Builder() \
            .type_info(Types.ROW_NAMED(\
                [   "id",
                    "name",
                    "high_price",
                    "low_price"
                    ], \
                [Types.INT(), Types.STRING(), Types.DOUBLE(), Types.DOUBLE()
                ])) \
                .build()

kafka_consumer = FlinkKafkaConsumer(
topics='topic_name',
deserialization_schema=deserialization_schema,
properties={'bootstrap.servers': 'localhost:9092', 'group.id': 'group_name'}
)

kafka_consumer.set_start_from_earliest()
ds = env.add_source(kafka_consumer)
ds = ds.map(update_openTime).print()
env.execute()