Questions tagged [apache-beam-kafkaio]
61 questions
4
votes
2 answers
How to perform checkpointing in apache beam while using flink runner?
I am reading from an unbound source (Kafka) and writing its wordcount to other Kafka topic. Now I want to perform checkpoint in beam Pipeline. I have followed all the instructions in the apache beam documentation but checkpoint directory is not…

Akul Sharma
- 69
- 5
3
votes
0 answers
KafkaIO GroupId after restart
I am using Apache Beam's KafkaIO to read from a Kafka topic. Everything is working as expected, but if my job is terminated and restarted, there is a new groupID that is generated by the new job hence it ends up reading from the beginning of the…

user3693309
- 343
- 4
- 14
2
votes
1 answer
KafkaIO - Different behaviors for enable.auto.commit set to true and commitOffsetsInFinalize when used with groupId
We are have an Apache Beam pipeline that is reading messages from a given kafka topic and doing further processing. My pipeline uses the FlinkRunner and I have described three different cases that we have tried:
Case 1: No group id specified:
Beam…

user2859928
- 21
- 1
2
votes
1 answer
commitOffsetsInFinalize() and checkmarks in Apache Beam
I am working on a Beam application that uses KafkaIO as an input
KafkaIO.read()
.withBootstrapServers("bootstrapServers")
.withTopic("topicName")
.withConsumerConfigUpdates(confs)
…

user3693309
- 343
- 4
- 14
1
vote
1 answer
Unable to use KafkaIO with Flink Runner
I am trying to use KafkaIO read with Flink Runner for Beam version 2.45.0
I am seeing the following issues with the same:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: No translator known for…

Aditya Tiwari
- 11
- 1
1
vote
0 answers
Error in data flow pipeline using windows and triggers
I have a streaming data stream that reads a kafka topical, which is a json that is converted to class object and then grouped by id and send an output json to another kafka topical.
The problem is that sending more than 500 json at the same time to…

Ricardo Ortega
- 11
- 2
1
vote
1 answer
How to manually commit kafka offset after FileIO in apache beam?
I have a FileIO writing a Pcollection to files and returns WriteFilesResult.
I would like to create a DoFn after writing files to commit the offset of written records to kafka but since my offsets are stored in my…

Jean Wisser
- 55
- 9
1
vote
0 answers
Apache Beam Consumer Prefix
I'm trying to setup a simple pipeline using Apache Beam to read data from Kafka. As it is a test, I run the pipeline on a DirectRunner. My consumer group needs to be prefixed with X for authorization reasons. But Apache Beam uses an internal…

nerdizzle
- 424
- 4
- 17
1
vote
1 answer
Apache Beam ReadFromKafka vs KafkaConsume
I'm working with a simple Apache Beam pipeline consisting of reading from an unbounded Kafka topic and printing the values out. I have two flavors of this. This is done via the Flink Runner.
Version 1
with beam.Pipeline(options=beam_options) as…

Benjamin Tan Wei Hao
- 9,621
- 3
- 30
- 56
1
vote
1 answer
How to read Kafka record ingestion timestamp in Apache Beam
I am new to Apache Beam and struggling with this problem for a while.
I am using KafkaIO as a source of my pipeline in Apache Beam Java .
I want to fetch Kafka record ingestion timestamp along with every record and write that as an additional…

manoveg
- 423
- 1
- 3
- 13
1
vote
1 answer
GRPC Error Docker Mac - Kafka Stream Processing with Python, Beam, and Flink
Update: I spun up an EC2 instance and was able to get the example below to work, which confirms that this is a connectivity issue with Docker on Mac.
Update: I still face this error even when I bring down the Flink Server Container and Kafka, which…

Mauricio Ortiz
- 51
- 3
1
vote
1 answer
Beam Kafka Streaming Input, No Output to print or text
I'm trying to count kafka message key, by using direct runner.
If I put max_num_records =20 in ReadFromKafka, I can see results printed or outputed to text.
like:
('2102', 5)
('2706', 5)
('2103', 5)
('2707', 5)
But without max_num_records, or if…

CannonFodder
- 11
- 3
1
vote
1 answer
How to consume Kafka messages with a protobuf definition in Apache Beam?
I'm using KafkaIO unbounded source in a Apache Beam pipeline running on DataFlow. Following configuration works for me
Map kafkaConsumerConfig = new HashMap() {{
put("auto.offset.reset", "earliest");
…

Ira Re
- 730
- 3
- 9
- 25
1
vote
1 answer
GCP Dataflow Kafka (as Azure Event Hub) -> Big Query
TDLR;
I have a Kafka-enabled Azure Event Hub that I'm trying to connect to from Google Cloud's Dataflow service to stream the data into Google Big Query. I successfully can use the Kafka CLI to talk to the Azure Event Hub. However, with GCP, after 5…

technogeek1995
- 3,185
- 2
- 31
- 52
1
vote
1 answer
Apache Beam HTTP Unbounded Source Python
Is it possible with the current version of Apache Beam to develop an unbounded source that receives data in a HTTP message?
My intention is to run an HTTP Server and to inject the messages received into a Beam Pipeline. If it is possible, can it be…

David lara
- 11
- 1