Questions tagged [apache-beam-kafkaio]

61 questions
1
vote
1 answer

Using the Beam Python SDK and PortableRunner to connect to Kafka with SSL

I have the code below for connecting to kafka using the python beam sdk. I know that the ReadFromKafka transform is run in a java sdk harness (docker container) but I have not been able to figure out how to make ssl.truststore.location and…
1
vote
1 answer

Estimating Watermark for Event Time in Beam

Im trying to use Beam to aggregate over a set of data using event time from the data and Kafka as data source. This works if all my kafka partitions are populated with data. However as soon as a partition has not yet been written to, the watermark…
1
vote
1 answer

Apache Beam KafkaIO consumers in consumer group getting assigned unique group id

I am running multiple instance of apache beam KafkaIO using DirectRunner, which are reading from same topic. But message is getting delivered to all running instances. After seeing Kafka configuration I found, group name is getting appended with…
Aditya
  • 207
  • 2
  • 13
1
vote
1 answer

The RemoteEnvironment cannot be used when submitting a program through a client, or running in a TestEnvironment context

I was trying to execute the apache-beam word count having Kafka as input and output. But on submitting the jar to the flink cluster, this error came - The RemoteEnvironment cannot be used when submitting a program through a client, or running in a…
1
vote
0 answers

How to infer schema from Confluent Schema Registry using Apache Beam?

I'm trying to create an Apache Beam pipeline where I read from a kafka topic and load it into Bigquery. Using Confluent's schema registry, I should be able to infer the schema when loading into Bigquery. However, the schema is not being inferred…
1
vote
1 answer

how to specify kafka brokers with KafkaIO in Apache Beam

I'm trying to setup a KafkaIO pipeline but i can't figure out how to specify brokers. Speficying brokername and port doesn't seem to do it. At no point am I specifying where my kafka cluster is: pipeline .apply(KafkaIO.
artofdoe
  • 167
  • 2
  • 14
1
vote
1 answer

Apache Beam KafkaIO consumers in consumer group reading same message

I'm using KafkaIO in dataflow to read messages from one topic. I use the following code. KafkaIO.read() .withReadCommitted() .withBootstrapServers(endPoint) …
0
votes
0 answers

Capturing deserialization exceptions in KafkaIO

I have a typical KafkaIO based source for reading Avro formatted key and value from a Kafka topic. PCollection> records = pipeline.apply( "Read from Kafka", KafkaIO.
pravish
  • 33
  • 8
0
votes
0 answers

Apache Beam pipeline reading from Kafka

I have pipeline which is consuming data from Kafka topic(topic uses compaction!). How can I terminate after reading all messages? for ex stop emitting messages after x amount of time has passed after the last message and terminate the read…
0
votes
1 answer

Fetch Truststore File Inside a Flex Template image for Confluent Kafka

We are trying to store the truststore.jks file inside the Flex Template Docker but while using it in the pipeline we are unable to locate it. we tried pulling the image and we can see the file is present in the docker at \tmp\trust.jks but while…
0
votes
0 answers

ApacheBeam KafkaIO - read messages from unbounded source and terminate pipeline

Which ways should be used for reading from Kafka topic via KafkaIO read() for reading all messages and terminate pipeline after that? Is withCheckStopReadingFn(function) suitable for that? Are there any approaches?
ovod
  • 49
  • 1
  • 7
0
votes
0 answers

ApacheBeeamRunJavaPipelineOperator running Kafka source connection from airflow worker instead of dataflow worker even while using dataflowrunner

I am trying to run a dataflow java job that runs perfectly fine on dataflow runner when submitted without composer. The same job when tried from composer using dataflowrunner, some how composer is executing Kafka connection in airflow worker host…
0
votes
0 answers

ReadFromKafka in Apache Beam python SDK doesn't work : java.io.IOException: error=2, No such file or directory

I am trying to run a simple beam program in python which reads messages from Kafka Topic and print it to the console but I am getting this error and don't know what is the issue. WARNING:root:Waiting for grpc channel to be ready at…
0
votes
0 answers

Unknown Protocol : local with beam.io.kafka.ReadFromKafka

I want to read from Kafka topic using beam but get the following error. Any hints? I can consume messages using kafka cli perfectly fine. RuntimeError: Pipeline unique-job-name_7b885f0f-c8fd-4763-bc1e-96817e714dac failed in state FAILED:…
0
votes
1 answer

Python Apache Beam SDK KafkaIO getting java.lang.RuntimeException: Failed to build transform kafka_read_without_metadata:v1

I try to run following code snippet using Apache Beam SDK for Python and get the java.lang.RuntimeException import apache_beam as beam from apache_beam.io.external.kafka import ReadFromKafka from apache_beam.io.external.kafka import…