Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.
Questions tagged [spark-streaming-kafka]
250 questions
4
votes
3 answers
PySpark and Kafka "Set are gone. Some data may have been missed.."
I'm running PySpark using a Spark cluster in local mode and I'm trying to write a streaming DataFrame to a Kafka topic.
When I run the query, I get the following message:
java.lang.IllegalStateException: Set(topicname-0) are gone. Some data may have…

Merlijn Sebrechts
- 545
- 1
- 5
- 16
4
votes
1 answer
How do partitions work in Spark Streaming?
I am working on performance improvement of a spark streaming application.
How the partition works in streaming environment. Is is same as loading a file in into spark or all the time it creates only one partition, making it work in only one core of…

Jithesh Gopinathan
- 428
- 5
- 19
4
votes
1 answer
some kafka params in spark-streaming-kafka-0-10_2.10 are fixed to none
I am using spark-streaming-kafka-0-10_2.10 of version 2.0.2 for spark streaming job. I got warns like this:
17/10/10 16:42:25 WARN KafkaUtils: overriding enable.auto.commit to false for executor
17/10/10 16:42:25 WARN KafkaUtils: overriding…

deerluffy
- 41
- 4
4
votes
2 answers
Spark streaming applications subscribing to same kafka topic
I am new to spark and kafka and I have a slightly different usage pattern of spark streaming with kafka.
I am using
spark-core_2.10 - 2.1.1
spark-streaming_2.10 - 2.1.1
spark-streaming-kafka-0-10_2.10 - 2.0.0
kafka_2.10 - 0.10.1.1
Continuous…

Gurubg
- 73
- 7
4
votes
3 answers
spark streaming + kafka - spark session API
Appreciate your help to run a spark streaming program using spark 2.0.2.
The run errors with "java.lang.ClassNotFoundException: Failed to find data source: kafka". Modified POM file as below.
Spark is being created but errors when the load from…

Aavik
- 967
- 19
- 48
4
votes
2 answers
Spark Streaming Kafka Consumer
I'm attempting to set up a Spark Streaming simple app that will read messages from a Kafka topic.
After much work I am at this stage but get the exceptions shown below.
Code:
public static void main(String[] args) throws Exception {
String…

Ken Alton
- 686
- 1
- 9
- 21
4
votes
2 answers
Apache Kafka and Spark Streaming
I'm reading through this blog post:
http://blog.jaceklaskowski.pl/2015/07/20/real-time-data-processing-using-apache-kafka-and-spark-streaming.html
It discusses about using Spark Streaming and Apache Kafka to do some near real time processing. I…

joesan
- 13,963
- 27
- 95
- 232
4
votes
2 answers
Kafka Spark streaming: unable to read messages
I am integrating Kafka and Spark, using spark-streaming. I have created a topic as a kafka producer:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
I am publishing messages in kafka and…

aiman
- 1,049
- 19
- 57
3
votes
1 answer
MicroBatchExecution: Query terminated with error UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
Here I am trying to execute Structured Based Streaming with Apache Kafka. But in here not working and execute error (ERROR MicroBatchExecution: Query [id = daae4c34-9c8a-4c28-9e2e-88e5fcf3d614, runId = ca57d90c-d584-41d3-a8de-6f9534ead0a0]…

Jahadul Rakib
- 476
- 1
- 5
- 19
3
votes
1 answer
How to specify the group id of kafka consumer for spark structured streaming?
I would like run 2 spark structured streaming jobs in the same emr cluster to consumer the same kafka topic. Both jobs are in the running status. However, only one job can get the kafka data. My configuration for kafka part is as following.
…

yyuankm
- 295
- 4
- 22
3
votes
0 answers
Spark Streaming: Many queued batches after a long time running without problems
We wrote a Spark Streaming application, that receives Kafka messages (backpressure enabled and spark.streaming.kafka.maxRatePerPartition set), maps the DStream into a Dataset and writes this datasets to Parquet files (inside DStream.foreachRDD) at…

D. Müller
- 3,336
- 4
- 36
- 84
3
votes
0 answers
Issue reading multiline kafka messages in Spark
I'm trying to read multiline json message on Spark 2.0.0., but I'm getting _corrupt_record. The code works fine for a single line json and when I'm trying to read the multiline json it as wholetextfile in REPL.
stream.map(record => (record.key(),…
3
votes
1 answer
Dispatch and initiate Spark Jobs on Kafka message
I have an external data source which sends the data thru Kafka.
As a fact this is not a real data, but links to the data.
"type": "job_type_1"
"urls": [
"://some_file"
"://some_file"
]
There is a single topic, but it contains type field basing…

dr11
- 5,166
- 11
- 35
- 77
3
votes
1 answer
Reading avro messages from Kafka in spark streaming/structured streaming
I am using pyspark for the first time.
Spark Version : 2.3.0
Kafka Version : 2.2.0
I have a kafka producer which sends nested data in avro format and I am trying to write code in spark-streaming/ structured streaming in pyspark which will…

Aayush Devgan
- 33
- 3
3
votes
1 answer
pyspark structured streaming write to parquet in batches
I am doing some transformation on the spark structured streaming dataframe. I am storing the transformed dataframe as parquet files in hdfs. Now I want that the write to hdfs should happen in batches instead of transforming the whole dataframe first…

Y0gesh Gupta
- 2,184
- 5
- 40
- 56