Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.
Questions tagged [spark-streaming-kafka]
250 questions
1
vote
1 answer
How to parse a json string column in pyspark's DataStreamReader and create a Data Frame
I am reading messages from a kafka topic
messageDFRaw = spark.readStream\
.format("kafka")\
.option("kafka.bootstrap.servers", "localhost:9092")\
.option("subscribe", "test-message")\
…

pulsar
- 141
- 2
- 13
1
vote
3 answers
How to include kafka timestamp value as columns in spark structured streaming?
I am looking for the solution for adding timestamp value of kafka to my Spark structured streaming schema. I have extracted the value field from kafka and making dataframe. My issue is, I need to get the timestamp field (from kafka) also along with…

BigD
- 850
- 2
- 17
- 40
1
vote
1 answer
Store Message Offset in Kafka using KafkaUtils.createDirectStream
How to store message offset in Kafka if i am using KafkaUtils.createDirectStream to read the messages.
Kafka is losing the offset value every time the application goes down.It is then reading the value provided in auto.offset.reset(which is latest)…

user1326784
- 627
- 3
- 11
- 31
1
vote
0 answers
Spark streaming slow down
In our spark app we're consuming Kafka stream and storing data to Cassandra DB.
First, we've run the stream without backpressure and experienced a weird anomaly where processing time was constant ~ 1 minute, however the scheduling delay was…

Tomas Bartalos
- 1,256
- 12
- 29
1
vote
3 answers
Array of JSON to Dataframe in Spark received by Kafka
I'm writing a Spark application in Scala using Spark Structured Streaming that receive some data formatted in JSON style from Kafka. This application could receive both a single or multiple JSON object formatted in this…

Vinc
- 35
- 1
- 4
1
vote
0 answers
Spark Streaming from Apache Kafka
I came across the following
For possible kafkaParams, see Kafka consumer config docs. If your
Spark batch duration is larger than the default Kafka heartbeat
session timeout (30 seconds), increase heartbeat.interval.ms and
session.timeout.ms…

Abdul Rahman
- 1,294
- 22
- 41
1
vote
1 answer
ExceptionInInitializerError Spark Streaming Kafka
I am trying to connect Spark Streaming to Kafka in a simple application. I created this application by the example from the Spark documentation. When I try to run it I get such an exception:
Exception in thread "main"…

Cassie
- 339
- 1
- 3
- 13
1
vote
0 answers
Spark Streaming Kafka MongoDB time out exception
I am new to spark streaming and trying to implement kafka, MongoDB integration where my code pulls the JSON data from Kafka topic and inserts into MongoDB. Below is my code
package com.streams.sparkmongo
import…

user8363477
- 655
- 4
- 14
- 24
1
vote
0 answers
Issue Storing offset in Kafka for Spark Streaming Application
In our cluster we have Kafka 0.10.1 and Spark 2.1.0. The spark streaming application works fine with checkpointing mechanism (checkpoints on HDFS). However, we noticed that, using checkpoints the Streaming Application does not restart if there is a…

java_enthu
- 2,279
- 7
- 44
- 74
1
vote
0 answers
spark-streaming-kafka-0-10: KafkaConsumer is not safe for multi-threaded access
I am trying to move from spark-streaming-kafka-0.8 to spark-streaming-kafka-0.10 and I faced the following error:
KafkaConsumer is not safe for multi-threaded access
We have multiple kafka clusters in different DCs that I want to consume…

Roman Studenikin
- 11
- 3
1
vote
2 answers
Kafka createDirectStream in Spark Streaming
I'm trying the example code from Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher). The code can run without any error but I cannot receive any record. If I run kafka-console-consumer.sh --from-beginning, I can get…

rsmin
- 21
- 4
1
vote
1 answer
spark-streaming kafka kerberos
I'm working on a poc spark-streaming job pulling from kafka. I am able to use the same code against a non-secured kafka 0.10 cluster however when I switch to run against an ssl/kerberos ( hdp 2.5 ) setup I'm getting an exception:
Caused by:…

Bill Schwanitz
- 11
- 1
- 3
1
vote
1 answer
spark streaming + kafka sbt compilation
I have an example with spark streaming + kafka. It works well from IDE. but when I try to compile it by SBT from console, like sbt compile. Have got an error.
The Main class:
val conf = new…

scala
- 425
- 1
- 4
- 12
1
vote
1 answer
Spark Streaming + kafka "JobGenerator" java.lang.NoSuchMethodError
I'm new in spark streaming and kafka and I don't understand this runtime exception. I've already setup the kafka server.
Exception in thread "JobGenerator" java.lang.NoSuchMethodError:…

Charles Jacquet
- 11
- 2
1
vote
1 answer
DSE Spark Streaming+Kafka NoSuchMethodError
I am trying to submit a spark streaming + kafka job which just reads lines of string from a kafka topic. However, I am getting the following exception
15/07/24 22:39:45 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job
…

delta313
- 73
- 7