I am trying to read a kafka topic and write the same in another kafka topic using KafkaSource/KafkaSink in pyflink (flink version 1.16). Reading from kafka topic works and I am able to print the result but when trying to send to kafka using KafkaSink I get the following exception:
NOTE: Picked up JDK_JAVA_OPTIONS: --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED
Traceback (most recent call last):
File "/home/.../PycharmProjects/reddit-anomaly-detection-job/main.py", line 75, in <module>
main()
File "/home/.../PycharmProjects/reddit-anomaly-detection-job/main.py", line 49, in main
kafka_producer = KafkaSink.builder() \
File "/home/.../.conda/envs/reddit-anomaly-detection-job/lib/python3.9/site-packages/pyflink/datastream/connectors/kafka.py", line 963, in set_record_serializer
get_field_value(j_topic_selector, 'topicSelector').getClass().getCanonicalName()
AttributeError: 'NoneType' object has no attribute 'startswith'
The code is:
# Create a Kafka producer using the SimpleStringSchema for serialization
record_serializer = KafkaRecordSerializationSchema.builder() \
.set_topic(kafka_sink_topic) \
.set_value_serialization_schema(SimpleStringSchema()) \
.build()
kafka_producer = KafkaSink.builder() \
.set_bootstrap_servers(bootstrap_servers) \
.set_record_serializer(record_serializer) \
.build()
UPDATE: It seems like the problem is from the local env. The same code runs in ververica on top of a custom python image. I tried to follow this article but with kafka and it is not working locally in PyCharm