0

I am a beginner of flink and kafka. I started zookeeper and kafka in windows system, and tried to test the example 'Kafka With Json Format' in official website in python environment.

import logging
import sys

from pyflink.common import Types
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors.kafka import FlinkKafkaProducer, FlinkKafkaConsumer
from pyflink.datastream.formats.json import JsonRowSerializationSchema, JsonRowDeserializationSchema


# Make sure that the Kafka cluster is started and the topic 'test_json_topic' is
# created before executing this job.
def write_to_kafka(env):
    type_info = Types.ROW([Types.INT(), Types.STRING()])
    ds = env.from_collection(
        [(1, 'hi'), (2, 'hello'), (3, 'hi'), (4, 'hello'), (5, 'hi'), (6, 'hello'), (6, 'hello')],
        type_info=type_info)

    serialization_schema = JsonRowSerializationSchema.Builder() \
        .with_type_info(type_info) \
        .build()
    kafka_producer = FlinkKafkaProducer(
        topic='test_json_topic',
        serialization_schema=serialization_schema,
        producer_config={'bootstrap.servers': 'localhost:9092', 'group.id': 'test_group'}
    )

    # note that the output type of ds must be RowTypeInfo
    ds.add_sink(kafka_producer)
    env.execute()


def read_from_kafka(env):
    deserialization_schema = JsonRowDeserializationSchema.Builder() \
        .type_info(Types.ROW([Types.INT(), Types.STRING()])) \
        .build()
    kafka_consumer = FlinkKafkaConsumer(
        topics='test_json_topic',
        deserialization_schema=deserialization_schema,
        properties={'bootstrap.servers': 'localhost:9092', 'group.id': 'test_group_1'}
    )
    kafka_consumer.set_start_from_earliest()

    env.add_source(kafka_consumer).print()
    env.execute()


if __name__ == '__main__':
    logging.basicConfig(stream=sys.stdout, level=logging.INFO, format="%(message)s")

    env = StreamExecutionEnvironment.get_execution_environment()
    env.add_jars("file:///path/to/flink-sql-connector-kafka-1.15.0.jar")

    print("start writing data to kafka")
    write_to_kafka(env)

    print("start reading data from kafka")
    read_from_kafka(env)

I wrote data from flink to kafka and read it, but when I executed `env.execute()` in function read_from_data, the following error appeared: 'Caused by: java.lang.RuntimeException: Failed to create stage bundle factory! INFO:root:Initializing Python harness: C:\\Users\\hp.conda\\envs\\myenv\\lib\\site-packages\\pyflink\\fn_execution\\beam\\beam_boot.py --id=5-1 --provision_endpoint=localhost:58376 INFO:root:Starting up Python harness in loopback mode.'

I don't know how to solve it. I used Kafka Tool to monitor and found that data can be successfully written into topic, but it cannot be read from topic. The offset in consumer test_group_1 remains unchanged, but the lag will increase. Hope to get help to make the program run normally and output the results.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Sherri
  • 1
  • `file:///` paths are specific to Unix systems, not Windows; you're missing `c:` as part of that I think. Also, try running these as two separate apps since the producer and consumer may start on different threads at the same time – OneCricketeer Jul 25 '23 at 21:59
  • My add_jars has been changed to a path suitable for Windows, and I have tried to run the write and read operations in two files separately, but the same error still appears. I observed that the data can be successfully written into topic (the write file can be executed successfully), but the data still cannot be read from topic. – Sherri Jul 26 '23 at 11:25

0 Answers0