Questions tagged [apache-beam-kafkaio]

61 questions
4
votes
2 answers

How to perform checkpointing in apache beam while using flink runner?

I am reading from an unbound source (Kafka) and writing its wordcount to other Kafka topic. Now I want to perform checkpoint in beam Pipeline. I have followed all the instructions in the apache beam documentation but checkpoint directory is not…
3
votes
0 answers

KafkaIO GroupId after restart

I am using Apache Beam's KafkaIO to read from a Kafka topic. Everything is working as expected, but if my job is terminated and restarted, there is a new groupID that is generated by the new job hence it ends up reading from the beginning of the…
user3693309
  • 343
  • 4
  • 14
2
votes
1 answer

KafkaIO - Different behaviors for enable.auto.commit set to true and commitOffsetsInFinalize when used with groupId

We are have an Apache Beam pipeline that is reading messages from a given kafka topic and doing further processing. My pipeline uses the FlinkRunner and I have described three different cases that we have tried: Case 1: No group id specified: Beam…
2
votes
1 answer

commitOffsetsInFinalize() and checkmarks in Apache Beam

I am working on a Beam application that uses KafkaIO as an input KafkaIO.read() .withBootstrapServers("bootstrapServers") .withTopic("topicName") .withConsumerConfigUpdates(confs) …
user3693309
  • 343
  • 4
  • 14
1
vote
1 answer

Unable to use KafkaIO with Flink Runner

I am trying to use KafkaIO read with Flink Runner for Beam version 2.45.0 I am seeing the following issues with the same: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: No translator known for…
1
vote
0 answers

Error in data flow pipeline using windows and triggers

I have a streaming data stream that reads a kafka topical, which is a json that is converted to class object and then grouped by id and send an output json to another kafka topical. The problem is that sending more than 500 json at the same time to…
1
vote
1 answer

How to manually commit kafka offset after FileIO in apache beam?

I have a FileIO writing a Pcollection to files and returns WriteFilesResult. I would like to create a DoFn after writing files to commit the offset of written records to kafka but since my offsets are stored in my…
1
vote
0 answers

Apache Beam Consumer Prefix

I'm trying to setup a simple pipeline using Apache Beam to read data from Kafka. As it is a test, I run the pipeline on a DirectRunner. My consumer group needs to be prefixed with X for authorization reasons. But Apache Beam uses an internal…
nerdizzle
  • 424
  • 4
  • 17
1
vote
1 answer

Apache Beam ReadFromKafka vs KafkaConsume

I'm working with a simple Apache Beam pipeline consisting of reading from an unbounded Kafka topic and printing the values out. I have two flavors of this. This is done via the Flink Runner. Version 1 with beam.Pipeline(options=beam_options) as…
1
vote
1 answer

How to read Kafka record ingestion timestamp in Apache Beam

I am new to Apache Beam and struggling with this problem for a while. I am using KafkaIO as a source of my pipeline in Apache Beam Java . I want to fetch Kafka record ingestion timestamp along with every record and write that as an additional…
manoveg
  • 423
  • 1
  • 3
  • 13
1
vote
1 answer

GRPC Error Docker Mac - Kafka Stream Processing with Python, Beam, and Flink

Update: I spun up an EC2 instance and was able to get the example below to work, which confirms that this is a connectivity issue with Docker on Mac. Update: I still face this error even when I bring down the Flink Server Container and Kafka, which…
1
vote
1 answer

Beam Kafka Streaming Input, No Output to print or text

I'm trying to count kafka message key, by using direct runner. If I put max_num_records =20 in ReadFromKafka, I can see results printed or outputed to text. like: ('2102', 5) ('2706', 5) ('2103', 5) ('2707', 5) But without max_num_records, or if…
1
vote
1 answer

How to consume Kafka messages with a protobuf definition in Apache Beam?

I'm using KafkaIO unbounded source in a Apache Beam pipeline running on DataFlow. Following configuration works for me Map kafkaConsumerConfig = new HashMap() {{ put("auto.offset.reset", "earliest"); …
1
vote
1 answer

GCP Dataflow Kafka (as Azure Event Hub) -> Big Query

TDLR; I have a Kafka-enabled Azure Event Hub that I'm trying to connect to from Google Cloud's Dataflow service to stream the data into Google Big Query. I successfully can use the Kafka CLI to talk to the Azure Event Hub. However, with GCP, after 5…
1
vote
1 answer

Apache Beam HTTP Unbounded Source Python

Is it possible with the current version of Apache Beam to develop an unbounded source that receives data in a HTTP message? My intention is to run an HTTP Server and to inject the messages received into a Beam Pipeline. If it is possible, can it be…
1
2 3 4 5