Questions tagged [spark-streaming-kafka]

Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.

250 questions
2
votes
2 answers

Deserializing Spark structured stream data from Kafka topic

I am working off Kafka 2.3.0 and Spark 2.3.4. I have already built a Kafka Connector which reads off a CSV file and posts a line from the CSV to the relevant Kafka topic. The line is like…
2
votes
0 answers

Kafka Spark Streaming Filing with source KAFKA not found

I am trying to Steam a Producer Topic Form Kafka. Getting the error that Kafka is not a valid data source I imported all the required packages like Kafka SQL streaming etc. BUILD.Gradle FILE dependencies { compile group: 'org.apache.kafka',…
Raptor0009
  • 258
  • 4
  • 14
2
votes
0 answers

How to build a dataframe thats a join of two kafka streams with "key(s)" column and remaining columns being the latest values

My Spark 2.4.x (pyspark) app requires: Inputs are two Kafka topics and output is a Kafka topic A "streaming table" where there's a logical key(s) and remaining columns should be latest values from either stream(s). Sub-second latency.…
2
votes
0 answers

Exactly once with Kafka + Spark Streaming

Is it possible to achieve exactly once by handling Kafka topic at Spark Streaming application? To achieve exactly once you need the following things: Exactly once on Kafka producer to Kafka broker. This is achieved by Kafka's 0.11 idempotent…
VB_
  • 45,112
  • 42
  • 145
  • 293
2
votes
0 answers

Spark sql streaming with Kafka on Json data : Function from_json not able to parse multiline json coming from kafka topic

Here, I am Sending the json data to kafka from "test" topic ,give the schema to json, do some transformation and print it on console. Here is the code:- val kafkadata = spark .readStream .format("kafka") …
2
votes
1 answer

Spark continuous processing mode does not read all kafka topic partition

I'm experimenting with Spark's Continuous Processing mode in Structured Streaming and I'm reading from a Kafka topic with 2 partitions while the Spark application has only one executor with one core. The application is a simple one where it simply…
2
votes
1 answer

Manually commit offset in kafka Direct Stream in python

I am porting a streaming application written in scala to python. I want to manually commit offset for DStream. This is done in scala like below: stream = KafkaUtils.createDirectStream(soomeConfigs) stream.foreachRDD { rdd => val offsetRanges =…
Girish Gupta
  • 1,241
  • 13
  • 27
2
votes
2 answers

Spark Structured Streaming Kafka Microbatch count

I am using Spark structured streaming to read records from a Kafka topic; I intend to count the number of records received in each 'Micro batch' in Spark readstream This is a snippet: val kafka_df = sparkSession .readStream .format("kafka") …
2
votes
1 answer

Spark Streaming kafka offset manage

I had been doing spark streaming jobs which consumer and produce data through kafka. I used directDstream,so I had to manage offset by myself,we adopted redis to write and read offsets.Now there is one problem,when I launched my client,my client…
2
votes
1 answer

Spark Streaming kafka concurrentModificationException

I am using a Spark streaming application. Application reads messages from Kafka topic (with 200 partitions) using a directstream. Occasionally the application throws ConcurrentModificationException-> java.util.ConcurrentModificationException:…
2
votes
1 answer

Spark Streaming Kafka java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer

I am using spark streaming with the Kafka integration, When i run the streaming application from my IDE in Local mode, everything works as a charm. However as soon as i submit it to the cluster i keep having the following error:…
MaatDeamon
  • 9,532
  • 9
  • 60
  • 127
2
votes
1 answer

Spark Streaming, kafka: java.lang.StackOverflowError

I am getting below error in spark-streaming application, i am using kafka for input stream. When i was doing with socket, it was working fine. But when i changed to kafka it's giving error. Anyone has idea why it's throwing error, do i need to…
Vibhuti
  • 1,584
  • 16
  • 19
2
votes
0 answers

Spark Streaming with Kafka

I want to do simple machine learning in Spark. First the application should do some learning from historical data from a file, train the machine learning model and then read input from kafka to give predictions in real time. To do that I believe I…
2
votes
2 answers

spark submit failed with spark streaming workdcount python code

I just copied the spark streaming wodcount python code, and use spark-submit to run the wordcount python code in Spark cluster, but it shows the following errors: py4j.protocol.Py4JJavaError: An error occurred while calling o23.loadClass. :…
Jack
  • 5,540
  • 13
  • 65
  • 113
1
vote
0 answers

Spark Streaming query dependency

I have a use case where two Spark streaming queries are running and consuming data from the same topic.I want to have a guarantee that Query1 consumes and process the data from the topic before Query2 does. Is there a way I can achieve this in…