0

I am trying to write a stream to a Kafka Topic using WriteToKafka class of apache Beam (python SDK). However it runs the script endlessly (without error) and doesn't write stream to the topic. I have to cancel run, it doesn't stop, it doesn't give error. Any help is appreciated. Below you can find a minimal example to reproduce issue

from typing import Tuple
import os

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io.kafka import WriteToKafka

pipeline_options = PipelineOptions(
    runner='FlinkRunner'
)


def convert_to_int(row: str) -> int:
    print(row)
    return int(row)

bootstrap_servers = 'localhost:9092'
topic = 'test'

folder_path = os.path.dirname(__file__)
input_file = os.path.join(folder_path, 'data/test.txt')
serializer = 'org.apache.kafka.common.serialization.LongSerializer'
with beam.Pipeline(options=pipeline_options) as p:

    stream = (p 
        | "left read" >> beam.io.ReadFromText(input_file)
        # | 'With timestamps' >> beam.Map(lambda event: beam.window.TimestampedValue(event, current_timestamp_ms()))
        | 'type cast' >> beam.Map(convert_to_int).with_output_types(int)
        # Kafka write transforms expects KVs.
        | beam.Map(lambda x: (x, x)).with_output_types(Tuple[int, int])
        | 'kafka_write' >> WriteToKafka(
            producer_config={
                'bootstrap.servers': bootstrap_servers
                },
            topic=topic,
            key_serializer=serializer,
            value_serializer=serializer,
            )
        )


data/test.txt file contains

1
2
3

BTW I have double checked the topic, and producer config.

akurmustafa
  • 122
  • 10
  • Maybe about this https://beam.apache.org/documentation/sdks/java-multi-language-pipelines/ ? – Metehan Yıldırım Jul 06 '22 at 14:00
  • @OneCricketeer I have edited the question to include minimal example. Kind regards – akurmustafa Jul 06 '22 at 14:10
  • Have you been able to identify what step is having problems? Can you read from the text file and output to a separate text file, for example? Can you write elements to Kafka directly with beam.Create? Also if its running as a streaming pipeline instead of batch it might expect windowing, so can you try adding windowing to the pipeline? – Daniel Oliveira Jul 13 '22 at 02:37

1 Answers1

2

To see what is put to the kafka stream I was using

bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092

command. bin/kafka-console-producer.sh assumes key and value to be String and uses "org.apache.kafka.common.serialization.StringDeserializer" for deserializing data recorded. Since data saved is in the Long format (python ints are long).One should use the command below to succesfully deserialize data in the topic.

bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092 --key-deserializer "org.apache.kafka.common.serialization.LongDeserializer" --value-deserializer "org.apache.kafka.common.serialization.LongDeserializer"
akurmustafa
  • 122
  • 10