Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.
Questions tagged [spark-streaming-kafka]
250 questions
0
votes
0 answers
"no main manifest attribute" in Spring Boot Application using Spark-Streaming when packaged with maven-shade-plugin
Here is my pom.xml :
…

charany1
- 871
- 2
- 12
- 27
0
votes
1 answer
Duplicates while publishing data to kafka topic using spark-streaming
I have spark-streaming application which consumes data from topic1 and parse it then publish same records into 2 processes one is into topic2 and other is to hive table. while publishing data to kafka topic2 I see duplicates and i don't see…

user5463155
- 81
- 1
- 5
0
votes
2 answers
How do i continuously stream data from kafka using spark structured streaming?
I am trying to migrate my DStream api's to strucutred streaming and tumbling upon how to await or not able to correlate microbatching with structured streaming.
In the below code am creating direct stream and awaiting forever so that i could…

Mozhi
- 757
- 1
- 11
- 28
0
votes
1 answer
net.jpounz.lz4 exception when reading from kafka with spark streaming
I use spark 2.4.0 using python. and read data from the kafka_2.11-2.0.0 (binary not source). I m using spark-submit --jars sspark-streaming-kafka-0-8-assembly_2.11-2.4.0.jar script.py an error message appears in the error report, if any one can…
0
votes
1 answer
Understanding checkpointing in kakfa structured streaming
In this (https://dzone.com/articles/what-are-spark-checkpoints-on-dataframes) article it says that checkpointing is used to "freeze the content of a dataframe before I do something else".
However in this…

Funzo
- 1,190
- 2
- 14
- 25
0
votes
1 answer
Continuous processing mode and python udf
does Spark 2.4.0 support Python UDFs with Continuous Processing mode?
In my simple code i'm consuming from a kafka topic, doing some trivial processing per-row (basically add a dummy field to the json messages) and write out to another…

Venki
- 417
- 4
- 7
0
votes
1 answer
Splitting Kafka Message Line by line in Spark Structured Streaming
I want to read a message from Kafka topic in my Spark Structured Streaming job into a data frame. but I am getting entire message in one offset so in data frame only this message is coming into one row instead of multiple rows. (in my case it is 3…

Atanu chatterjee
- 457
- 5
- 16
0
votes
1 answer
How to get aggregated data for a particular day in spark structured streaming
i have one spark structured steaming job that read streams from kafka and write output to HDFS.
My issue is i need an aggregated results for the entire day till particular time.
Since spark structured streaming doesn't support complete/update mode,…

BigD
- 850
- 2
- 17
- 40
0
votes
1 answer
How do I prioritise maven dependency over Spark classpath when submitting Spark job?
I have a Cloudera distribution of Hadoop, Spark etc where the Spark-Kafka version is 0.8 (i.e. spark-streaming-kafka-0-8_2.11).
The issue is, version 0.8 of the Apache Spark with Kafka Integration has Kafka version 0.8.2.1 built inside and I require…
user10486861
0
votes
1 answer
Spark Streaming specify starting-ending offsets
I have a scenario where I want to re-process a particular batch of data coming in from Kafka using Spark DStreams.
let's say I want to re-process the following batches of data.
Topic-Partition1-{1000,2000}
Topic-Partition2-{500-600}
Below is the…

Venkata
- 317
- 3
- 13
0
votes
1 answer
spark streaming kafka issue while streaming is started
I am trying to read data from kafka consumer using spark2-shell.
Please find my code below.
I start my spark2-shell in below way:
spark2-shell --jars kafka-clients-0.10.1.2.6.2.0-205.jar, spark-sql-kafka-0-10_2.11-2.1.1.jar
And please find my…

Abhishek Allamsetty
- 54
- 1
- 1
- 7
0
votes
2 answers
Spark streaming from kafka topic using scala
I am new in scala/Spark development. I have created a simple streaming application from Kafka topic using sbt and scala. I have the following code
build.sbt
name := "kafka-streaming"
version := "1.0"
assemblyOption in assembly := (assemblyOption…

Abdul Manaf
- 4,933
- 8
- 51
- 95
0
votes
1 answer
Closing Spark Streaming Context after first batch (trying to retrieve kafka offsets)
I am trying to retrieve Kafka offsets for my Spark Batch job. After retrieving the offsets, I would like to close the stream context.
I tried adding a streamlistener to the stream context, and implementing the onBatchCompleted method to close the…
user10486861
0
votes
1 answer
Avoid write files for empty partitions in Spark Streaming
I have Spark Streaming job which reads data from kafka partitions (one executor per partition).
I need to save transformed values to HDFS, but need to avoid empty files creation.
I tried to use isEmpty, but this doesn't help when not all partitions…

Ruslan Ostafiichuk
- 4,422
- 6
- 30
- 35
0
votes
0 answers
Apache spark streaming kafka API vs kinesis API
I am having an scala spark application in which I need to switch between streaming from kafka vs kinesis based on the application configuration.
Both the spark API's for kafka streaming (spark-streaming-kafka-0-10_2.11) and kinesis streaming…

Biju Gopinathan
- 105
- 10