-1

I am trying to run a pipeline using apache-beam with source as one kafka topic and destination as another kafka topic. I have written my code and is working well(i.e., no error in code I think). But I cannot see data in my output topic This is the code :

import apache_beam as beam
import apache_beam.transforms.window as window
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io.external.kafka import ReadFromKafka, WriteToKafka

def run_pipeline():
    with beam.Pipeline(options=PipelineOptions()) as p:
       (p
        | 'Read from Kafka' >> ReadFromKafka(consumer_config={'bootstrap.servers':'localhost:9092',
                                                  'auto.offset.reset': 'latest'}, topics=['demo'])
        | 'Window of 10 seconds' >> beam.WindowInto(window.FixedWindows(10))
        #| 'Group by key' >> beam.GroupByKey()
        | 'Write to Kafka' >> WriteToKafka(producer_config={'bootstrap.servers':'localhost:9092'},
                                                              topic='demo_output'))
        #| 'Write to console' >> beam.Map(print)
        #| 'Write to text' >> beam.io.WriteToText('outputfile.txt')

if __name__ == '__main__':
    run_pipeline()

https://maximilianmichels.com/2020/getting-started-with-beam-python/

This is the actual blog post that I am trying to follow.

I used the console to produce my source kafka messages.

$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic demo --property "parse.key=true" --property "key.separator=:"

But, I still am not able to see my messages being pushed to my destination topic when I try to consume them.

$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic demo_output
  • 1
    `--topic demo_output` is not the same topic used in your code. Also, after you group by a key, you probably need to do something with that grouping? For example, the linked blog post sums all the values for the same key **then** writes to Kafka. – OneCricketeer Jun 03 '22 at 15:17
  • I agree with @OneCricketeer , as he mentions you are not using the same `topic`. – Jose Gutierrez Paliza Jun 03 '22 at 20:43
  • Sorry about that, I edited my code to not have any use for GroupByKey and changed the kafka topic to correct one. But still poses the same same problem. I even tried to ouput it to the console and tried to ouput to a text file. But nothing seems to work. – yashdeepKumar Jun 04 '22 at 06:58
  • I don't think you can write window objects into Kafka, either. Did you try just consuming from one topic and writing directly to another? – OneCricketeer Jun 04 '22 at 14:01
  • Yes. Working with kafka topics isn't my goal. I wanted to perform a small streaming beam pipeline locally. So I wanted to do this. – yashdeepKumar Jun 04 '22 at 15:36
  • Is there an error that you can see in the Logs? – Jose Gutierrez Paliza Jun 10 '22 at 21:08

1 Answers1

0

Does the GBK have any output on your job details page? It's possible that your topic sends very old data (event time). Thus they are all considered late data and discarded by the pipeline. A similar issue is Apache Beam GroupByKey Produces No Output

ningk
  • 1,298
  • 1
  • 7
  • 7
  • Sorry abt the edit I just made. Even without using GBK and trying just to push the latest data to the destination topic is not working. In fact even outputing to the console or outputing to a text file also doesn't work. – yashdeepKumar Jun 04 '22 at 07:00