Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.
Questions tagged [spark-streaming-kafka]
250 questions
1
vote
0 answers
How to store data into HDFS using spark streaming
I want to store streaming data into hdfs. Its a spark streaming code capture data from kafka topic.
I tried this
lines.saveAsHadoopFiles("hdfs://192.168.10.31:9000/user/spark/mystream/", "abc")
this is my code let me know here to write code for…

Arun
- 41
- 2
- 9
1
vote
1 answer
Spark arbitrary stateful stream aggregation, flatMapGroupsWithState API
10 days old spark developer, trying to understand the flatMapGroupsWithState API of spark.
As I understand:
We pass 2 options to it which are timeout configuration. A possible value is GroupStateTimeout.ProcessingTimeTimeout i.e. kind of an…

Gaurav Kumar
- 1,091
- 13
- 31
1
vote
0 answers
How to read json data using python from kafka topic in apache spark
I am new spark, Could you please let me know how to read json data using python from kafka topic in apache spark.
I tried following Code which worked for single line json but not for multiline json
*kafkaStream = KafkaUtils.createStream(ssc,…

Amarjeet Singh
- 23
- 10
1
vote
0 answers
SparkStreaming tasks fails with NullPointerException
I'm running spark job with kafka-clients:2.2.1 and
spark-streaming-kafka-0-10_2.11:2.4.3
The job is running in the following mode:
val scc: StreamingContext = new StreamingContext(spark, "180 seconds")
val kafkaSink: Broadcast[KafkaSink] = ...
val…

Julias
- 5,752
- 17
- 59
- 84
1
vote
0 answers
Spark streaming context not starting
I'm trying to read data from Kafka topic using Spark streaming, Below is the code and libraries am using. while code looks fine but the ssc.start() gets hanged without printing any ERROR or INFO. Any pointers for issue will be a great…

Chhaya Vishwakarma
- 1,407
- 9
- 44
- 72
1
vote
1 answer
How to fix logging and version compatibility in spark-submit of jar file
I'm trying to submit a jar file for execution on spark engine. I'm trying to integrate spark with kafka and using eclipse to build and export the jar file of the sample code https://github.com/apache/spark/tree/v2.1.1/examples
I got two exceptions:…

Ahmad Alhilal
- 444
- 4
- 19
1
vote
1 answer
spark-submit error Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
In my spark program, I have this code:
val df = spark.readStream
.format("kafka")
.option("subscribe", "raw_weather")
.option("kafka.bootstrap.servers", "s of my brokers")
.option("kafka.security.protocol",…

Sparker0i
- 1,787
- 4
- 35
- 60
1
vote
1 answer
Unable to read the data from kafka topics using spark streaming
I am trying to read the data from the kafka topic using the spark streaming. I am able to produce the message into the kafka topic, but whilereading the data from topic using spark streaming i am getting error message as given below:
ERROR…

Pramod
- 113
- 2
- 14
1
vote
0 answers
Spark Streaming readStream unable to read from secure Kafka (EventStreams)
I'm trying to send the data from a program to a secure Kafka cluster (EventStreams on IBM Cloud - Cloud Foundry Services), then in my consumer application (which is spark streaming), I'm trying to read the data from the same kafka source.
Here are…

Sparker0i
- 1,787
- 4
- 35
- 60
1
vote
1 answer
Kafka - Spark Streaming Integration: DStreams and Task reuse
I am trying to understand the internals of Spark Streaming (not Structured Streaming), specifically the way tasks see the DStream. I am going over the source code of Spark in scala, here. I understand the call stack:
ExecutorCoarseGrainedBackend…

Sheel Pancholi
- 621
- 11
- 25
1
vote
0 answers
What could be reasons for spark job to stuck
I'm running Spark 2.3.1 standalone cluster.
My job is consuming from Kafka mini batches every 2 minutes and writing aggregation to some store. Job looks like as following:
val stream = KafkaUtils.createDirectStream(...)
stream.foreachRDD { rdd =>
…

Julias
- 5,752
- 17
- 59
- 84
1
vote
0 answers
ERROR Utils: Uncaught exception in thread stdout writer for python
I use spark 2.4.0 using python. and read data from the kafka_2.11-2.0.0 (binary not source). I m using spark-submit --jars sspark-streaming-kafka-0-8-assembly_2.11-2.4.0.jar script.py an error message appears in the error report, if any one can help…

Rad304
- 115
- 1
- 13
1
vote
1 answer
How to create connection(s) to a Datasource in Spark Streaming for Lookups
I have a use case where we are streaming events and for each event I have to do some lookups. The Lookups are in Redis and I am wondering what is the best way to create the connections. The spark streaming would run 40 executors and I have 5 such…

user3679686
- 516
- 1
- 6
- 20
1
vote
0 answers
Spark interaction with Kafka with two different principals
I have the following question.
I'm using a Spark Structured Streaming job which reads from one topic and writes to another topic of the same kerberized Kafka cluster. Everything works super.
But my problem is the following: How would I handle the…

Neven
- 123
- 6
1
vote
2 answers
Deserialize Kafka json message with PySpark Streaming
I have a pyspark application that is consuming messages from a Kafka topic, these messages are serialized by org.apache.kafka.connect.json.JsonConverter. I'm using confluent Kafka JDBC connector to do this
The issue is, when I consume the messages,…

anonuser1234
- 511
- 2
- 11
- 24