Questions tagged [apache-beam-kafkaio]
61 questions
1
vote
1 answer
Using the Beam Python SDK and PortableRunner to connect to Kafka with SSL
I have the code below for connecting to kafka using the python beam sdk. I know that the ReadFromKafka transform is run in a java sdk harness (docker container) but I have not been able to figure out how to make ssl.truststore.location and…

Zareman
- 311
- 2
- 8
1
vote
1 answer
Estimating Watermark for Event Time in Beam
Im trying to use Beam to aggregate over a set of data using event time from the data and Kafka as data source. This works if all my kafka partitions are populated with data. However as soon as a partition has not yet been written to, the watermark…

Robert156
- 41
- 3
1
vote
1 answer
Apache Beam KafkaIO consumers in consumer group getting assigned unique group id
I am running multiple instance of apache beam KafkaIO using DirectRunner, which are reading from same topic. But message is getting delivered to all running instances. After seeing Kafka configuration I found, group name is getting appended with…

Aditya
- 207
- 2
- 13
1
vote
1 answer
The RemoteEnvironment cannot be used when submitting a program through a client, or running in a TestEnvironment context
I was trying to execute the apache-beam word count having Kafka as input and output. But on submitting the jar to the flink cluster, this error came -
The RemoteEnvironment cannot be used when submitting a program through a client, or running in a…

Akul Sharma
- 69
- 5
1
vote
0 answers
How to infer schema from Confluent Schema Registry using Apache Beam?
I'm trying to create an Apache Beam pipeline where I read from a kafka topic and load it into Bigquery. Using Confluent's schema registry, I should be able to infer the schema when loading into Bigquery. However, the schema is not being inferred…

artofdoe
- 167
- 2
- 14
1
vote
1 answer
how to specify kafka brokers with KafkaIO in Apache Beam
I'm trying to setup a KafkaIO pipeline but i can't figure out how to specify brokers.
Speficying brokername and port doesn't seem to do it. At no point am I specifying where my kafka cluster is:
pipeline
.apply(KafkaIO.

artofdoe
- 167
- 2
- 14
1
vote
1 answer
Apache Beam KafkaIO consumers in consumer group reading same message
I'm using KafkaIO in dataflow to read messages from one topic. I use the following code.
KafkaIO.read()
.withReadCommitted()
.withBootstrapServers(endPoint)
…

bigbounty
- 16,526
- 5
- 37
- 65
0
votes
0 answers
Capturing deserialization exceptions in KafkaIO
I have a typical KafkaIO based source for reading Avro formatted key and value from a Kafka topic.
PCollection> records =
pipeline.apply(
"Read from Kafka",
KafkaIO.

pravish
- 33
- 8
0
votes
0 answers
Apache Beam pipeline reading from Kafka
I have pipeline which is consuming data from Kafka topic(topic uses compaction!). How can I terminate after reading all messages? for ex stop emitting messages after x amount of time has passed after the last message and terminate the read…

ovod
- 49
- 1
- 7
0
votes
1 answer
Fetch Truststore File Inside a Flex Template image for Confluent Kafka
We are trying to store the truststore.jks file inside the Flex Template Docker but while using it in the pipeline we are unable to locate it.
we tried pulling the image and we can see the file is present in the docker at \tmp\trust.jks but while…

somnath chouwdhury
- 13
- 4
0
votes
0 answers
ApacheBeam KafkaIO - read messages from unbounded source and terminate pipeline
Which ways should be used for reading from Kafka topic via KafkaIO read() for reading all messages and terminate pipeline after that? Is withCheckStopReadingFn(function) suitable for that? Are there any approaches?

ovod
- 49
- 1
- 7
0
votes
0 answers
ApacheBeeamRunJavaPipelineOperator running Kafka source connection from airflow worker instead of dataflow worker even while using dataflowrunner
I am trying to run a dataflow java job that runs perfectly fine on dataflow runner when submitted without composer. The same job when tried from composer using dataflowrunner, some how composer is executing Kafka connection in airflow worker host…

Rajesh Babu Devabhaktuni
- 19
- 1
- 9
0
votes
0 answers
ReadFromKafka in Apache Beam python SDK doesn't work : java.io.IOException: error=2, No such file or directory
I am trying to run a simple beam program in python which reads messages from Kafka Topic and print it to the console but I am getting this error and don't know what is the issue.
WARNING:root:Waiting for grpc channel to be ready at…

piby180
- 388
- 1
- 6
- 18
0
votes
0 answers
Unknown Protocol : local with beam.io.kafka.ReadFromKafka
I want to read from Kafka topic using beam but get the following error. Any hints?
I can consume messages using kafka cli perfectly fine.
RuntimeError: Pipeline unique-job-name_7b885f0f-c8fd-4763-bc1e-96817e714dac failed in state
FAILED:…

piby180
- 388
- 1
- 6
- 18
0
votes
1 answer
Python Apache Beam SDK KafkaIO getting java.lang.RuntimeException: Failed to build transform kafka_read_without_metadata:v1
I try to run following code snippet using Apache Beam SDK for Python and get the java.lang.RuntimeException
import apache_beam as beam
from apache_beam.io.external.kafka import ReadFromKafka
from apache_beam.io.external.kafka import…