0

I'm trying to run this simple example where data from a Kafka topic are filtered out: https://www.talend.com/blog/2018/08/07/developing-data-processing-job-using-apache-beam-streaming-pipeline/

I have a similar setup with a localhost broker with default settings, but i can't even read from the topic.

When running the application it gets stuck in a infinite loop and nothing happens. I've tried giving gibberish url for my broker to see if it's even able to reach to them - it's not. The cluster is up and running and i'm able to add messages to the topic. Here is where i specify the broker and the topic:

        pipeline
            .apply(
                    KafkaIO.<Long, String>read()
                            .withBootstrapServers("localhost:9092")
                            .withTopic("BEAM_IN")
                            .withKeyDeserializer(LongDeserializer.class)
                            .withValueDeserializer(StringDeserializer.class)
                            )

I don't see any errors and there is nothing written to the output topic.

When debugging, I see it's stuck in this loop:

        while(Instant.now().isBefore(completionTime)) {
        ExecutorServiceParallelExecutor.VisibleExecutorUpdate update = this.visibleUpdates.tryNext(Duration.millis(25L));
        if (update == null && ((State)this.pipelineState.get()).isTerminal()) {
            return (State)this.pipelineState.get();
        }

        if (update != null) {
            if (this.isTerminalStateUpdate(update)) {
                return (State)this.pipelineState.get();
            }

            if (update.thrown.isPresent()) {
                Throwable thrown = (Throwable)update.thrown.get();
                if (thrown instanceof Exception) {
                    throw (Exception)thrown;
                }

                if (thrown instanceof Error) {
                    throw (Error)thrown;
                }

                throw new Exception("Unknown Type of Throwable", thrown);
            }
        }

In the isKeyed(PValue pvalue) method within the ExecutorServiceParallelExecutor class.

What am I missing?

artofdoe
  • 167
  • 2
  • 14
  • 1
    KafkaIO is unbounded source by default (if you don't use "withMaxNumRecords()" or "withMaxReadTime()" to make it bounded). So it's fine that it runs continuously. How do you check that it didn't read any messages from inout topic? – Alexey Romanenko Jun 22 '20 at 11:53
  • I've tried adding withMaxNumRecords(5) and I've manually put 10 messages in my input topic. I don't see anything in my output topic, which I've specified, so i'm assuming it's not being read. Maybe it's being read and not being written to the output topic – artofdoe Jun 22 '20 at 13:31
  • Actually when I add messages *after* I start the application, it terminates now. But I still don't see anything in my output topic – artofdoe Jun 22 '20 at 13:43
  • 1
    "When the pipeline starts for the first time, or without any checkpoint, the source starts consuming from the latest offsets. You can override this behavior to consume from the beginning by setting appropriate appropriate properties in ConsumerConfig, through Read#withConsumerConfigUpdates(Map). You can also enable offset auto_commit in Kafka to resume from last committed" – Alexey Romanenko Jun 22 '20 at 15:51
  • 1
    Could you share you pipeline code? – Alexey Romanenko Jun 22 '20 at 15:51
  • @AlexeyRomanenko thanks for the input. I did manage to figure it out, but i've since come acorss another issue. I've posted a separate question. Any input would be appreciated. https://stackoverflow.com/questions/62544980/how-to-infer-avro-schema-from-a-kafka-topic-in-apache-beam-kafkaio – artofdoe Jun 25 '20 at 21:02

0 Answers0