Highest Voted 'spark-streaming-kafka' Questions

4

votes

3 answers

PySpark and Kafka "Set are gone. Some data may have been missed.."

I'm running PySpark using a Spark cluster in local mode and I'm trying to write a streaming DataFrame to a Kafka topic. When I run the query, I get the following message: java.lang.IllegalStateException: Set(topicname-0) are gone. Some data may have…

asked Nov 20 '20 at 01:43

Merlijn Sebrechts

545
1
5
16

4

votes

1 answer

How do partitions work in Spark Streaming?

I am working on performance improvement of a spark streaming application. How the partition works in streaming environment. Is is same as loading a file in into spark or all the time it creates only one partition, making it work in only one core of…

scala apache-spark spark-streaming rdd spark-streaming-kafka

asked Sep 15 '19 at 03:55

Jithesh Gopinathan

428
5
19

4

votes

1 answer

some kafka params in spark-streaming-kafka-0-10_2.10 are fixed to none

I am using spark-streaming-kafka-0-10_2.10 of version 2.0.2 for spark streaming job. I got warns like this: 17/10/10 16:42:25 WARN KafkaUtils: overriding enable.auto.commit to false for executor 17/10/10 16:42:25 WARN KafkaUtils: overriding…

spark-streaming spark-streaming-kafka

asked Oct 10 '17 at 12:08

deerluffy

41
4

4

votes

2 answers

Spark streaming applications subscribing to same kafka topic

I am new to spark and kafka and I have a slightly different usage pattern of spark streaming with kafka. I am using spark-core_2.10 - 2.1.1 spark-streaming_2.10 - 2.1.1 spark-streaming-kafka-0-10_2.10 - 2.0.0 kafka_2.10 - 0.10.1.1 Continuous…

apache-spark apache-kafka spark-streaming spark-streaming-kafka

asked Aug 29 '17 at 10:26

Gurubg

73
7

4

votes

3 answers

spark streaming + kafka - spark session API

Appreciate your help to run a spark streaming program using spark 2.0.2. The run errors with "java.lang.ClassNotFoundException: Failed to find data source: kafka". Modified POM file as below. Spark is being created but errors when the load from…

scala apache-spark apache-kafka spark-streaming-kafka

asked Dec 12 '16 at 12:07

Aavik

967
19
48

4

votes

2 answers

Spark Streaming Kafka Consumer

I'm attempting to set up a Spark Streaming simple app that will read messages from a Kafka topic. After much work I am at this stage but get the exceptions shown below. Code: public static void main(String[] args) throws Exception { String…

java apache-kafka spark-streaming spark-streaming-kafka

asked Oct 14 '16 at 11:03

Ken Alton

686
1
9
21

4

votes

2 answers

Apache Kafka and Spark Streaming

I'm reading through this blog post: http://blog.jaceklaskowski.pl/2015/07/20/real-time-data-processing-using-apache-kafka-and-spark-streaming.html It discusses about using Spark Streaming and Apache Kafka to do some near real time processing. I…

apache-spark apache-kafka spark-streaming reactive-programming spark-streaming-kafka

asked Sep 16 '15 at 02:19

joesan

13,963
27
95
232

4

votes

2 answers

Kafka Spark streaming: unable to read messages

I am integrating Kafka and Spark, using spark-streaming. I have created a topic as a kafka producer: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test I am publishing messages in kafka and…

hadoop apache-kafka spark-streaming spark-streaming-kafka

asked Nov 28 '14 at 05:55

aiman

1,049
19
57

3

votes

1 answer

MicroBatchExecution: Query terminated with error UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

Here I am trying to execute Structured Based Streaming with Apache Kafka. But in here not working and execute error (ERROR MicroBatchExecution: Query [id = daae4c34-9c8a-4c28-9e2e-88e5fcf3d614, runId = ca57d90c-d584-41d3-a8de-6f9534ead0a0]…

java apache-spark apache-spark-sql spark-structured-streaming spark-streaming-kafka

asked Aug 20 '20 at 17:49

Jahadul Rakib

476
1
5
19

3

votes

1 answer

How to specify the group id of kafka consumer for spark structured streaming?

I would like run 2 spark structured streaming jobs in the same emr cluster to consumer the same kafka topic. Both jobs are in the running status. However, only one job can get the kafka data. My configuration for kafka part is as following. …

apache-spark apache-spark-sql spark-streaming spark-streaming-kafka

asked Aug 01 '20 at 08:53

yyuankm

295
4
22

3

votes

0 answers

Spark Streaming: Many queued batches after a long time running without problems

We wrote a Spark Streaming application, that receives Kafka messages (backpressure enabled and spark.streaming.kafka.maxRatePerPartition set), maps the DStream into a Dataset and writes this datasets to Parquet files (inside DStream.foreachRDD) at…

apache-spark apache-kafka spark-streaming parquet spark-streaming-kafka

asked Apr 21 '20 at 20:33

D. Müller

3,336
4
36
84

3

votes

0 answers

Issue reading multiline kafka messages in Spark

I'm trying to read multiline json message on Spark 2.0.0., but I'm getting _corrupt_record. The code works fine for a single line json and when I'm trying to read the multiline json it as wholetextfile in REPL. stream.map(record => (record.key(),…

json apache-spark apache-kafka spark-streaming-kafka

asked Aug 13 '19 at 11:32

Raghavendra Pratap Singh

31
5

3

votes

1 answer

Dispatch and initiate Spark Jobs on Kafka message

I have an external data source which sends the data thru Kafka. As a fact this is not a real data, but links to the data. "type": "job_type_1" "urls": [ "://some_file" "://some_file" ] There is a single topic, but it contains type field basing…

apache-spark spark-streaming spark-streaming-kafka

asked Jul 29 '19 at 18:33

dr11

5,166
11
35
77

3

votes

1 answer

Reading avro messages from Kafka in spark streaming/structured streaming

I am using pyspark for the first time. Spark Version : 2.3.0 Kafka Version : 2.2.0 I have a kafka producer which sends nested data in avro format and I am trying to write code in spark-streaming/ structured streaming in pyspark which will…

pyspark apache-kafka spark-streaming spark-structured-streaming spark-streaming-kafka

asked May 02 '19 at 08:17

Aayush Devgan

33
3

3

votes

1 answer

pyspark structured streaming write to parquet in batches

I am doing some transformation on the spark structured streaming dataframe. I am storing the transformed dataframe as parquet files in hdfs. Now I want that the write to hdfs should happen in batches instead of transforming the whole dataframe first…

pyspark spark-streaming spark-structured-streaming spark-streaming-kafka

asked Apr 26 '19 at 02:06

Y0gesh Gupta

2,184
5
40
56

Questions tagged [spark-streaming-kafka]