Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.
Questions tagged [spark-streaming-kafka]
250 questions
2
votes
2 answers
Deserializing Spark structured stream data from Kafka topic
I am working off Kafka 2.3.0 and Spark 2.3.4. I have already built a Kafka Connector which reads off a CSV file and posts a line from the CSV to the relevant Kafka topic. The line is like…

Sushrut J Mair
- 131
- 6
2
votes
0 answers
Kafka Spark Streaming Filing with source KAFKA not found
I am trying to Steam a Producer Topic Form Kafka. Getting the error that Kafka is not a valid data source
I imported all the required packages like Kafka SQL streaming etc.
BUILD.Gradle FILE
dependencies {
compile group: 'org.apache.kafka',…

Raptor0009
- 258
- 4
- 14
2
votes
0 answers
How to build a dataframe thats a join of two kafka streams with "key(s)" column and remaining columns being the latest values
My Spark 2.4.x (pyspark) app requires:
Inputs are two Kafka topics and output is a Kafka topic
A "streaming table" where
there's a logical key(s) and
remaining columns should be latest values from either stream(s).
Sub-second latency.…

Venki
- 417
- 4
- 7
2
votes
0 answers
Exactly once with Kafka + Spark Streaming
Is it possible to achieve exactly once by handling Kafka topic at Spark Streaming application?
To achieve exactly once you need the following things:
Exactly once on Kafka producer to Kafka broker. This is achieved by Kafka's 0.11 idempotent…

VB_
- 45,112
- 42
- 145
- 293
2
votes
0 answers
Spark sql streaming with Kafka on Json data : Function from_json not able to parse multiline json coming from kafka topic
Here, I am Sending the json data to kafka from "test" topic ,give the schema to json, do some transformation and print it on console.
Here is the code:-
val kafkadata = spark
.readStream
.format("kafka")
…

Radhika
- 21
- 1
2
votes
1 answer
Spark continuous processing mode does not read all kafka topic partition
I'm experimenting with Spark's Continuous Processing mode in Structured Streaming and I'm reading from a Kafka topic with 2 partitions while the Spark application has only one executor with one core.
The application is a simple one where it simply…

M-Doru
- 71
- 7
2
votes
1 answer
Manually commit offset in kafka Direct Stream in python
I am porting a streaming application written in scala to python. I want to manually commit offset for DStream. This is done in scala like below:
stream = KafkaUtils.createDirectStream(soomeConfigs)
stream.foreachRDD { rdd =>
val offsetRanges =…

Girish Gupta
- 1,241
- 13
- 27
2
votes
2 answers
Spark Structured Streaming Kafka Microbatch count
I am using Spark structured streaming to read records from a Kafka topic; I intend to count the number of records received in each 'Micro batch' in Spark readstream
This is a snippet:
val kafka_df = sparkSession
.readStream
.format("kafka")
…

irrelevantUser
- 1,172
- 18
- 35
2
votes
1 answer
Spark Streaming kafka offset manage
I had been doing spark streaming jobs which consumer and produce data through kafka. I used directDstream,so I had to manage offset by myself,we adopted redis to write and read offsets.Now there is one problem,when I launched my client,my client…

Frank
- 977
- 3
- 14
- 35
2
votes
1 answer
Spark Streaming kafka concurrentModificationException
I am using a Spark streaming application. Application reads messages from Kafka topic (with 200 partitions) using a directstream. Occasionally the application throws ConcurrentModificationException->
java.util.ConcurrentModificationException:…

scorpio
- 329
- 1
- 18
2
votes
1 answer
Spark Streaming Kafka java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer
I am using spark streaming with the Kafka integration, When i run the streaming application from my IDE in Local mode, everything works as a charm. However as soon as i submit it to the cluster i keep having the following error:…

MaatDeamon
- 9,532
- 9
- 60
- 127
2
votes
1 answer
Spark Streaming, kafka: java.lang.StackOverflowError
I am getting below error in spark-streaming application, i am using kafka for input stream. When i was doing with socket, it was working fine. But when i changed to kafka it's giving error. Anyone has idea why it's throwing error, do i need to…

Vibhuti
- 1,584
- 16
- 19
2
votes
0 answers
Spark Streaming with Kafka
I want to do simple machine learning in Spark.
First the application should do some learning from historical data from a file, train the machine learning model and then read input from kafka to give predictions in real time. To do that I believe I…

Mr M.
- 715
- 1
- 8
- 24
2
votes
2 answers
spark submit failed with spark streaming workdcount python code
I just copied the spark streaming wodcount python code, and use spark-submit to run the wordcount python code in Spark cluster, but it shows the following errors:
py4j.protocol.Py4JJavaError: An error occurred while calling o23.loadClass.
:…

Jack
- 5,540
- 13
- 65
- 113
1
vote
0 answers
Spark Streaming query dependency
I have a use case where two Spark streaming queries are running and consuming data from the same topic.I want to have a guarantee that Query1 consumes and process the data from the topic before Query2 does.
Is there a way I can achieve this in…

V Dineshkumar
- 86
- 6