Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.
Questions tagged [spark-streaming-kafka]
250 questions
3
votes
1 answer
How to optimize number of executor instances in spark structured streaming app?
Runtime
YARN cluster mode
Application
Spark structured streaming
Read data from Kafka topic
About Kafka topic
1 topic with 4 partitions -for now. (number of partitions can be changed)
Added 2000 records maximum in topic per 1 second.
I've found…

nullmari
- 442
- 5
- 16
3
votes
1 answer
Spark streaming and kafka Missing required configuration "partition.assignment.strategy" which has no default value
I am trying to run the spark streaming application with Kafka using yarn. I am getting the following Stack trace error-
Caused by: org.apache.kafka.common.config.ConfigException: Missing required configuration "partition.assignment.strategy" which…

Y0gesh Gupta
- 2,184
- 5
- 40
- 56
3
votes
2 answers
How to read from specific Kafka partition in Spark structured streaming
I have three partitions for my Kafka topic and I was wondering if I could read from just one partition out of three. My consumer is spark structured streaming application.
Below is my existing kafka settings in spark.
val inputDf =…

hampi2017
- 701
- 2
- 13
- 33
3
votes
1 answer
Spark Kafka streaming in spark 2.3.0 with python
I recently upgraded to Spark 2.3.0. I had a existing spark job which used to run on spark 2.2.0.
I am facing the Java Exception of AbstractMethodError
My simple code:
from pyspark import SparkContext …

ajay_t
- 2,347
- 7
- 37
- 62
3
votes
1 answer
Spark Streaming Kafka Stream batch execution
I'm new in spark streaming and I have a general question relating to its usage. I'm currently implementing an application which streams data from a Kafka topic.
Is it a common scenario to use the application to run a batch only one time, for…

Vik
- 324
- 3
- 9
3
votes
0 answers
Spark Streaming - Kafka - java.nio.BufferUnderflowException
I'm running into below error while trying to consume message from Kafka through Spark streaming (Kafka direct API). This used to work OK when using Spark standalone cluster manager. We just switched to using Cloudera 5.7 using Yarn to manage Spark…

codehammer
- 876
- 2
- 10
- 27
3
votes
1 answer
Spark Streaming Kafka createDirectStream - Spark UI shows input event size as zero
I have implemented Spark Streaming using createDirectStream. My Kafka producer is sending several messages every second to a topic with two partitions.
On Spark streaming side, i read kafka messages every second and them I'm windowing them on 5…

Sudheer Palyam
- 2,499
- 2
- 23
- 28
2
votes
1 answer
Spark not giving equal tasks to all executors
I am reading from kafka topic which has 5 partitions. Since 5 cores are not sufficient to handle the load, I am doing repartitioning the input to 30. I have given 30 cores to my spark process with 6 cores on each executor. With this setup i was…

best wishes
- 5,789
- 1
- 34
- 59
2
votes
0 answers
Spark structured streaming how to write to Kafka in Protobuf format
Spark: 3.0.0
Scala: 2.12
confluent
I am having spark structured streaming job and looking for an example for writing data frames to Kafka in Protbuf format.
I read messages from PostgreSQL and after doing all the transformations have a data frame…

JDev
- 1,662
- 4
- 25
- 55
2
votes
1 answer
Right way to read stream from Kafka topic using checkpointLocation offsets
I'm trying to develop a small Spark app (using Scala) to read messages from Kafka (Confluent) and write them (insert) into Hive table. Everything works as expected, except for one important feature - managing offsets when the application is…

deeplay
- 376
- 3
- 20
2
votes
1 answer
Does Spark Structured Streaming have some timeout issue when reading streams from a Kafka topic?
I implemented a spark job to read stream from a kafka topic with foreachbatch in the structured streaming.
val df = spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "mykafka.broker.io:6667")
.option("subscribe",…

yyuankm
- 295
- 4
- 22
2
votes
1 answer
Driver stops executors without a reason
I have an application based on spark structured streaming 3 with kafka, which is processing some user logs and after some time the driver is starting to kill the executors and I don't understand why.
The executors doesn't contain any errors. I'm…

M. Alexandru
- 614
- 5
- 20
2
votes
1 answer
Why a new batch is triggered without getting any new offsets in streaming source?
I have multiple spark structured streaming jobs and the usual behaviour that I see is that a new batch is triggered only when there are any new offsets in Kafka which is used as source to create streaming query.
But when I run this example which…

conetfun
- 1,605
- 4
- 17
- 38
2
votes
1 answer
kafka kafka-consumer-groups.sh --describe returns no output for a consumer group
kafka version 1.1
--list can get the consumers group
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list --command-config config/client_security.properties
Note: This will not show information about old Zookeeper-based…

wyx
- 3,334
- 6
- 24
- 44
2
votes
3 answers
How do I convert a dataframe to JSON and write to kafka topic with key
I'm trying to write a dataframe to kafka in JSON format and add a key to the data frame in Scala, i'm currently working with this sample from the kafka-spark:
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.write
…

user2883071
- 960
- 1
- 20
- 50