Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.
Questions tagged [spark-streaming-kafka]
250 questions
1
vote
1 answer
Spark Streaming Kafka Integration direct Approach EOFException
when i run spark streaming example org.apache.spark.examples.streaming.JavaDirectKafkaWordCount,i caught an EOFException follow,how can I resolve it
Exception in thread "main" org.apache.spark.SparkException: java.io.EOFException: Received -1 when…

vhypnus
- 11
- 3
0
votes
0 answers
Handling changes in spark streaming pipelines
May I know the common/suggested practice when we need to perform re-ingestion in Spark structured streaming pipeline? For ex: any bug in consumer streaming code which reads from a queue. In such cases we as consumer reading from queue would need to…

steve
- 129
- 2
- 9
0
votes
0 answers
Kafka consumer fetches new messages only after restarting the consumer
I'm facing an issue with my kafka consumer job written in scala. when we start the consumer, it fetches all messages available in the broker from the last consumed offset, process those JSON messages and writes them to Hive table. After writing, it…

Mani Ganesh
- 1
- 2
0
votes
0 answers
Spark structured stream aggregation - Evicted Executors – does not seem to be heap related
We are seeing an issue with executors being evicted on our spark streaming application. We are doing native spark aggregation from a kafka stream using watermarking
spark.readStream()
.format("kafka")
.withWatermark( "watermarked_timestamp",…

Ivan Murray
- 1
- 1
0
votes
1 answer
Disable Kafka Warnings for Spark Streaming Application
I use Spark Structured Streaming (pyspark) to read data from Kafka topic. It works well but when I open executors stderr my whole log page is WARN from Kafka saying that
kafkadataconsumer is not running in uninterruptiblethread. it may hang when…

Huvi
- 63
- 2
- 7
0
votes
0 answers
error reading Scala signature of org.apache.spark.internal.Logging: unsafe symbol Level (child of package log4j) in runtime reflection universe
Spark 3.4.0
Scala 2.12.12 compile-time and scala 2.12.17 packaged with spark at runtime
kafka
While migrating from spark2.2 I am facing issues to read kafka streams and getting into below error, Experts please help me as I am relatively new to the…

Chandan Gawri
- 364
- 1
- 4
- 15
0
votes
0 answers
spark streaming kafka auth with ssl certificates
Faced with the problem of authentication in the kafka topic using SSL from spark-streaming.
I've got 3 ssl certs in pem format for authentication in the kafka topic:
ssl_cafile
ssl_certfile
ssl_keyfile.
In kafka-python I'm using them in such…
0
votes
0 answers
Convert some specific columns that have 0 and 1 values in Kafka messages to False and True in PySpark
Requirement
We are consuming messages from Kafka using PySpark. In these JSON messages, there are some keys corresponding to which we have values such as 0 and 1.
Now the requirement here is to convert these 0's and 1's to False and True while…

tall-e.stark
- 23
- 4
0
votes
0 answers
How to access other df in spark streaming process
I want to apply a matching algorithm on the live data coming to kafka.I also initial load the table that should be checked and compared with current stream and need to check whether the current stream match with at-least one record of loaded table…

ArefehTam
- 367
- 1
- 6
- 20
0
votes
0 answers
Handle Schema evolution while consuming messages from Kafka using PySpark
I am new to Kafka. Currently I am working on a requirement -
Usecase:
I am consuming messages from Kafka (The messages are produced in the Kafka by upstream team). The Upstream team doesn't maintain the schema versions and haven't implemented schema…

tall-e.stark
- 23
- 4
0
votes
0 answers
Spark Streaming - NoSuchMethodError: scala.collection.immutable.Map$.apply
I have a simple Spark streaming java program:
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.spark.SparkConf;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.*;
import…

Eugene Goldberg
- 14,286
- 20
- 94
- 167
0
votes
0 answers
In Spark streaming, getting no output in append mode
I am reading data from Kaka topic and then doing a group by on the message received from topic
val lines = spark.readStream.format("Kafka")...
val df1 = lines.select($"timestamp", $"value".cast("STRING"))
...// Created a schema and fetched message…

Richa
- 1
- 1
0
votes
0 answers
Unable to run kafka producer for twitter data
I am trying to run kafka to stream twitter data using some specific keywords. In my case I used "Dollar8". But I am getting error as shown below. I am using macOS.
import tweepy
from kafka import KafkaProducer
import logging
from tweepy import…

Joey Tribbiani
- 5
- 4
0
votes
1 answer
Async Checkpointing in Spark Structured Streaming using RocksDB
I am currently exploring on enabling async checkpointing in Spark Structured streaming , but not able to find any way for the same. DataBricks is offering the same for its flavour of Spark.
Spark Structured Streaming 3.3.1 and RocksDB 7.7.3
Any…

Aviral Kumar
- 814
- 1
- 15
- 40
0
votes
1 answer
Error-Queries with streaming sources must be executed with writeStream.start();; kafka
I was trying is to handle real-time data streaming of kafka using pyspark.
I got a table that get updated realtime . whenever there is a content on the table, i need to aggregate it and stream the count to another consumer. While I tried to do it I…

Scarlett Code
- 1
- 1