Questions tagged [apache-beam-kafkaio]

61 questions
0
votes
1 answer

GroupByKey() with Apache Beam

I am trying to stream messages from kafka consumer to with 30 seconds windows using apache beam. Used beam_nuggets.io for reading from a kafka topic. You can see my code below: with beam.Pipeline(options=PipelineOptions()) as p: consumer_message…
0
votes
1 answer

KafkaIO withBootStrapServers

I am trying to get a server ID as a parameter while executing the run command using ValueProvider For the Value Provider in Options Interface: ValueProvider getKafkaServer(); void setKafkaServer(ValueProvider
0
votes
1 answer

how to use withDynamicRead with KafkaIO in Apache BEAM

I'm using read with KafkaIO in Apache Beam and i'm trying to call withDynamicRead. I also have a basic call of withCheckStopReadingFn: .withCheckStopReadingFn(new SerializableFunction() { @Override public Boolean…
artofdoe
  • 167
  • 2
  • 14
0
votes
1 answer

Handing Avro messages based on Apache Beam KafkaIO python SDK

I am currently trying to read the message as ByteDeserializer similar to the following example KafkaIO. My test setup is as follows: Option 1: Configured to use --runner=PortableRunner Option 2: start the local flink job server, docker run…
Vim
  • 71
  • 3
0
votes
1 answer

How to process Avro input from Kafka (with Apache Beam) when there are multiple subjects on one topic?

In order to process Avro-encoded messages with Apache Beam using KafkaIO, one needs to pass an instance of ConfluentSchemaRegistryDeserializerProvider as the value deserializer. A typical example looks like this: PCollection
0
votes
0 answers

Apache Beam Kafka Source Connector Idle Partition Issue with "CustomTimeStampPolicyWithLimitedDelay"

Source is kafka for our beam pipeline. Apache beam's kafka IO connector supports moving of watermark(in case of flink runner) even if any partition is idle. The applications who would want to process packets based on the timestamp of the packet…
Jay Ghiya
  • 424
  • 5
  • 16
0
votes
1 answer

Apache Beam, KafkaIO at least once semantics

We are implementing a pilot that reads from Kafka and writes to BigQuery. Simple pipeline: KafkaIO.read BigQueryIO.write We switched off the auto-commit. And we are using commitOffsetsInFinalize() Can this setup guarantee that message will appear…
0
votes
1 answer

KafkaIO checkpoint persistence with Google Dataflow Runner

I am trying to understand how the offsets and group management works with the Google Dataflow runner with KafkaIO reader. More specifically, I am trying to understand how offset management works: If the group.id config is set and if auto-commit and…
Viraj
  • 777
  • 1
  • 13
  • 32
0
votes
2 answers

Consuming messages from Google Pubsub and publishing it to Kafka

I am trying to consume Google PubSub messages using synchronous PULL API. This is available in Apache Beam Google PubSub IO connector library. I want to write the consumed messages to Kafka using KafkaIO. I want to use FlinkRunner to execute the…
0
votes
0 answers

How do concept of checkpointing/Fault tolerance work work in apache beam?

I am working on the apache beam streaming pipeline with Kafka producer as input and consumer for the output. Can anyone help me out with checkpoint in apache-beam
0
votes
1 answer

write to multiple Kafka topics in apache-beam?

I am executing a simple word count program where I used one Kafka topic (producer) as an input source then I apply a pardo to it for calculating the word count. Now I need help to write the words to different topics on the basis of their frequency.…
0
votes
2 answers

How to infer avro schema from a kafka topic in Apache Beam KafkaIO

I'm using Apache Beam's kafkaIO to read from a topic that has an avro schema in Confluent schema registry. I'm able to deserialize the message and write to files. But ultimately i want to write to BigQuery. My pipeline isn't able to infer the…
0
votes
0 answers

KafkaIO with Apache Beam stuck in infinite loop on DirectRunner

I'm trying to run this simple example where data from a Kafka topic are filtered out: https://www.talend.com/blog/2018/08/07/developing-data-processing-job-using-apache-beam-streaming-pipeline/ I have a similar setup with a localhost broker with…
artofdoe
  • 167
  • 2
  • 14
0
votes
1 answer

How to set AvroCoder with KafkaIO and Apache Beam with Java

I'm trying to create a pipeline that streams data from a Kafka topic to google's Bigquery. Data in the topic is in Avro. I call the apply function 3 times. Once to read from Kafka, once to extract record and once to write to Bigquery. Here is the…
artofdoe
  • 167
  • 2
  • 14
0
votes
1 answer

Apache Beam KafkaIO producer routing different messages to different topics

I have a usecase where the incoming data has a key that identifies different type of the data. There's a single input kafka topic where all types of data are thrown at it. The beam pipeline reads all the messages from the input kafka topic and has…
bigbounty
  • 16,526
  • 5
  • 37
  • 65