Questions tagged [apache-beam-kafkaio]

61 questions
0
votes
0 answers

How to create a KafkaRecord in Apache Beam Manually for Unit Tests

I'm doing an Apache Beam-based implementation, and data is taken from a Kafka stream into the pipeline through a KafkaIO. After reading the data, I have a few PTranforms to process the input data and I need to unit test the first PTranform that…
Prasad
  • 83
  • 1
  • 8
0
votes
1 answer

Disable Direct Runner Logs in Apache Beam for Kafka Consumer

Seen a similar question asked, but on dataflow logging and not direct logging. Basically, I want to turn off the wave of KafkaIO read (consumer) logs. I have tried setting the logging levels in SDK harness as follows. var kafkasLogs = …
Lemon
  • 43
  • 6
0
votes
1 answer

Apache Beam KafkaIO Reader & Writer - Error handling and Retry mechanism

I'm working on an Apache Beam Pipeline-based implementation and I consume data from a Kafka stream. After doing some processing I need to publish the processed data into three different Kafka topics. As the runner, I use Apache Flink. My question…
0
votes
3 answers

Apache Beam KafkaIO - Write to Multiple Topics

Currently, I'm working on Apache Beam Pipeline implementation which consumes data from three different Kafka topics, and after some processing, I create three types of objects adding those data taken from the above-mentioned Kafka topics. Finally,…
Prasad
  • 83
  • 1
  • 8
0
votes
0 answers

avro schema parser ignoring logical type for byte type

I am trying to parse avro schema string to Schema object using avro lib..when parsing the parser seems ignoring the logical type provided in the avro schema and causing the deserilization of json data not to work properly. Sample avro schema (json…
vkt
  • 1,401
  • 2
  • 20
  • 46
0
votes
1 answer

Send Big query table rows to Kafka avro message using apache beam

I need to publish the Big query table rows to Kafka in Avro format. PCollection rows = pipeline .apply( "Read from BigQuery query", …
vkt
  • 1,401
  • 2
  • 20
  • 46
0
votes
0 answers

Google Dataflow with "Workflow failed"

I'm working on simple beam dataflow in JAVA on google cloud platform. I've tested locally and the pipeline is running well. When i deploy on dataflow, i got this looping error : { insertId: "10mkpdlb7i" labels: {4} logName:…
0
votes
1 answer

Apache Beam Pipeline KafkaIO - Commit offset manually

I have a Beam pipeline to consume streaming events with multiple stages (PTransforms) to process them. See the following code, pipeline.apply("Read Data from Stream", StreamReader.read()) .apply("Decode event and extract relevant…
Prasad
  • 83
  • 1
  • 8
0
votes
1 answer

Failing Apache Beam Pipeline when consuming events through KafkaIO on Flink runner

I have a beam pipeline with several stages that consumes data through a KafkaIO and the code looks like below, pipeline.apply("Read Data from Stream", StreamReader.read()) .apply("Decode event and extract relevant fields", ParDo.of(new…
Prasad
  • 83
  • 1
  • 8
0
votes
2 answers

How to consume Avro Serialized messages from AWS MSK via Apache Beam

PCollection> kafkaRecordPCollection = pipeline.apply( KafkaIO.read() .withBootstrapServers("bootstrap-server") .withTopic("topic") …
0
votes
0 answers

Apache Beam KafkaIO - Set truststore file(jks) location in Kafka consumer properties

I am running Apache Beam Java APP in Spark Client mode using Yarn. On Spark submit, the jks file is getting copied to working directory of the Spark executors. But the reference to this path in Apache Beam KafkaIO config parameter is not…
Kartik
  • 39
  • 4
0
votes
1 answer

Apache Beam WriteToKafka (python SDK) doesn't write to topic (no manifest of error)

I am trying to write a stream to a Kafka Topic using WriteToKafka class of apache Beam (python SDK). However it runs the script endlessly (without error) and doesn't write stream to the topic. I have to cancel run, it doesn't stop, it doesn't give…
0
votes
1 answer

How to expose Kafka metrics using KafkaIO Beam in python?

Want to be able to expose my consumer and producer metrics in a python written Beam pipeline that uses the KafkaIO library. Examples of the metrics I mean are the ones that you get from the python confluent-kafka library…
RMCP
  • 11
  • 2
0
votes
1 answer

Apache Beam Issue with Spark Runner while using Kafka IO

I am trying to test KafkaIO for the Apache Beam Code with a Spark Runner. The code works fine with a Direct Runner. However, if I add below codeline it throws error: options.setRunner(SparkRunner.class); Error: ERROR…
0
votes
1 answer

How can I simulate event lateness in Apache Beam reading from a Kafka Source

I am trying to tweak my windowing parameter in my streaming Beam pipeline. The parameters that I am modifying are withAllowedLateness, triggers, interval, pane-firing, etc. However I don't know how to trigger lateness in my Kafka consuming pipeline…
Fabio
  • 555
  • 3
  • 9
  • 24