Questions tagged [spark-kafka-integration]

Use this tag for any Spark-Kafka integration. This tag should be used for both batch and stream processing while also covering Spark Streaming (DStreams) and Structured Streaming.

This tag is related to the spark-streaming-kafka and spark-sql-kafka libraries.

External sources:

To precise your question, you can consider adding

This tag serves as a synonym for the existing (low traffic) tag spark-streaming-kafka which only focuses on Spark Streaming (not batch and not Structured Streaming).

96 questions

vote

1 answer

Rewind and reconsume offset in structured streaming from Kafka

Is there a way we can rewind the offset in Structured Streaming? I am using Spark version 3 and I have configured my startingoffset as earliest and every restart after that will be picking the offset value from checkpoint directory. For example:…

apache-spark apache-kafka spark-structured-streaming spark-kafka-integration

asked Mar 15 '21 at 10:10

Eswaramoorthy P

vote

2 answers

Send data to Kafka topics based on a condition in Dataframe

I want to change the Kafka topic destination to save the data depending on the value of the data in SparkStreaming. Is it possible to do so again? When I tried the following code, it only executes the first one, but does not execute the lower…

apache-spark pyspark apache-kafka spark-structured-streaming spark-kafka-integration

asked Mar 05 '21 at 02:40

jp_spark

vote

1 answer

Writing Spark DataFrame to Kafka is ignoring the partition column and kafka.partitioner.class

I am trying to write a Spark DF (batch DF) to Kafka and i need to write the data to specific partitions. I tried the following code myDF.write .format("kafka") .option("kafka.bootstrap.servers", kafkaProps.getBootstrapServers) …

apache-spark apache-kafka apache-spark-sql spark-structured-streaming spark-kafka-integration

asked Jan 27 '21 at 14:56

Sateesh K

1,071
3
19
45

vote

1 answer

kafka-consumer-groups command doesnt show LAG and CURRENT-OFFSET for spark structured streaming applications(consumers)

I have a spark structured streaming application consuming from kafka, for this application I would like to monitor the consumer lag. I 'm using below command to check consumer lag. However I don't get the CURRENT-OFFSET and hence LAG is blank too.…

apache-spark apache-kafka kafka-consumer-api spark-structured-streaming spark-kafka-integration

asked Jan 22 '21 at 15:16

fuubarbaz

vote

0 answers

Logging with Spark/Kafka stream processing application

I'm new to working in Scala with the Spark and Kafka integration. However, I'm running into an issue logging. I have tried many different logging libraries, but they all return the same error from Spark. The error is the following: Exception in…

scala apache-spark-sql spark-streaming spark-kafka-integration

asked Jan 12 '21 at 00:25

Misumi

vote

3 answers

Spark 3 structured streaming use maxOffsetsPerTrigger in Kafka source with Trigger.Once

We need to use maxOffsetsPerTrigger in the Kafka source with Trigger.Once() in structured streaming but based on this issue it seems reads allAvailable in spark 3. Is there a way for achieving rate limit in this situation? Here is a sample code in…

apache-spark apache-kafka spark-structured-streaming spark-kafka-integration

asked Dec 02 '20 at 14:18

Amin

votes

1 answer

How to add Kafka dependencies for PySpark on a Jupyter notebook

I have setup kafka 2.1 on windows and able to successfully communicate a topic from producer to consumer over localhost:9092. I now want to consume this in a spark structured stream. For this I setup spark 3.4 and installed pyspark over Jupyter…

pyspark apache-kafka jupyter spark-kafka-integration

asked Aug 16 '23 at 18:05

rarpal

votes

0 answers

Spark Kafka: understanding offset management with enable.auto.commit

according to the Kafka documentation offset in Kafka can be managed using enable.offset.commit and auto.commit.interval.ms. I have difficulties understanding the concept. For example I have a Kafka that shall batch load everyday and only shall load…

apache-spark apache-kafka spark-structured-streaming spark-kafka-integration

asked Jul 25 '23 at 09:07

AzUser1

votes

0 answers

Spark Streaming: How to handle failure in Spark when connecting to multiple kafka cluster via Union of Dstream?

I have a requirement where I have to read from multiple Kafka clusters (more than 20 clusters) via spark streaming. I am able to read all of them by basically creating kafka Direct Stream for all the kafka cluster and performing union on…

apache-spark apache-kafka spark-streaming spark-kafka-integration

asked Jul 10 '23 at 22:53

Deepank Porwal

votes

1 answer

Running Kafka and Spark with docker-compose

My goal is to send/produce a txt from my Windows PC to a container running Kafka, to then be consume by pyspak (running in other container). I'm using docker-compose where I define a custom net and several containers, such as: spark-master, two…

python docker apache-spark apache-kafka spark-kafka-integration

asked May 12 '23 at 16:02

yaviens

votes

1 answer

How to map a message to a object with `schema` and `payload` in Spark structured streaming correctly?

I am hoping to map a message to a object with schema and payload inside in during Spark structured streaming. This is my original code val input_schema = new StructType() .add("timestamp", DoubleType) .add("current", DoubleType) …

scala apache-spark apache-kafka spark-structured-streaming spark-kafka-integration

asked Apr 27 '23 at 22:06

Hongbo Miao

45,290
60
174
267

votes

1 answer

Getting Error for org.apache.spark.sql.Encoder and missing or invalid dependency find while loading class file SQLImplicits, LowPrioritySQLImplicits

I am running following code to read kafka stream with spark-3.2.2, and scala 2.12.0. Earlier same code was working fine with spark-2.2 and scala 2.11.8, import spark.implicits._ val kafkaStream = spark .readStream .format("kafka") …

scala apache-spark spark-streaming spark-kafka-integration

asked Apr 19 '23 at 09:36

Chandan Gawri

votes

0 answers

What is the best way to set up a kafka connection with apache spark

How to make the kafka stream more stable? as it it will run constantly without having us to start the run again after it fails (so far we are thinking about using the "continous" run mode to make it automatically start a new run even after a…

apache-spark databricks spark-kafka-integration

asked Mar 25 '23 at 02:28

Trodenn

votes

0 answers

Spark Kafka error while publishing data to kafka topic

Am getting the below error while publishing(writestream) dataframe data to kafka topic. Can you please guide me here?

apache-kafka spark-streaming apache-kafka-streams spark-structured-streaming spark-kafka-integration

asked Mar 10 '23 at 17:43

Tokvl

votes

0 answers

How to read DF Column of struct type and add they key value pairs to kafka headers?

I have a new dataframe with 2 columns one is headers and other is a payload, facing issues in reading the headers column and assigning the values to kafka headers while publishing. earlier the dataframe had 4 columns as Old Df Schema : Id -…

scala apache-spark apache-kafka spark-streaming spark-kafka-integration

asked Feb 09 '23 at 21:20

abhimanyu

Prev 1 2 3

5 6 7 Next