Questions tagged [spark-streaming-kafka]

Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.

250 questions
4
votes
3 answers

PySpark and Kafka "Set are gone. Some data may have been missed.."

I'm running PySpark using a Spark cluster in local mode and I'm trying to write a streaming DataFrame to a Kafka topic. When I run the query, I get the following message: java.lang.IllegalStateException: Set(topicname-0) are gone. Some data may have…
4
votes
1 answer

How do partitions work in Spark Streaming?

I am working on performance improvement of a spark streaming application. How the partition works in streaming environment. Is is same as loading a file in into spark or all the time it creates only one partition, making it work in only one core of…
4
votes
1 answer

some kafka params in spark-streaming-kafka-0-10_2.10 are fixed to none

I am using spark-streaming-kafka-0-10_2.10 of version 2.0.2 for spark streaming job. I got warns like this: 17/10/10 16:42:25 WARN KafkaUtils: overriding enable.auto.commit to false for executor 17/10/10 16:42:25 WARN KafkaUtils: overriding…
deerluffy
  • 41
  • 4
4
votes
2 answers

Spark streaming applications subscribing to same kafka topic

I am new to spark and kafka and I have a slightly different usage pattern of spark streaming with kafka. I am using spark-core_2.10 - 2.1.1 spark-streaming_2.10 - 2.1.1 spark-streaming-kafka-0-10_2.10 - 2.0.0 kafka_2.10 - 0.10.1.1 Continuous…
4
votes
3 answers

spark streaming + kafka - spark session API

Appreciate your help to run a spark streaming program using spark 2.0.2. The run errors with "java.lang.ClassNotFoundException: Failed to find data source: kafka". Modified POM file as below. Spark is being created but errors when the load from…
Aavik
  • 967
  • 19
  • 48
4
votes
2 answers

Spark Streaming Kafka Consumer

I'm attempting to set up a Spark Streaming simple app that will read messages from a Kafka topic. After much work I am at this stage but get the exceptions shown below. Code: public static void main(String[] args) throws Exception { String…
Ken Alton
  • 686
  • 1
  • 9
  • 21
4
votes
2 answers

Apache Kafka and Spark Streaming

I'm reading through this blog post: http://blog.jaceklaskowski.pl/2015/07/20/real-time-data-processing-using-apache-kafka-and-spark-streaming.html It discusses about using Spark Streaming and Apache Kafka to do some near real time processing. I…
4
votes
2 answers

Kafka Spark streaming: unable to read messages

I am integrating Kafka and Spark, using spark-streaming. I have created a topic as a kafka producer: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test I am publishing messages in kafka and…
aiman
  • 1,049
  • 19
  • 57
3
votes
1 answer

MicroBatchExecution: Query terminated with error UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

Here I am trying to execute Structured Based Streaming with Apache Kafka. But in here not working and execute error (ERROR MicroBatchExecution: Query [id = daae4c34-9c8a-4c28-9e2e-88e5fcf3d614, runId = ca57d90c-d584-41d3-a8de-6f9534ead0a0]…
3
votes
1 answer

How to specify the group id of kafka consumer for spark structured streaming?

I would like run 2 spark structured streaming jobs in the same emr cluster to consumer the same kafka topic. Both jobs are in the running status. However, only one job can get the kafka data. My configuration for kafka part is as following. …
3
votes
0 answers

Spark Streaming: Many queued batches after a long time running without problems

We wrote a Spark Streaming application, that receives Kafka messages (backpressure enabled and spark.streaming.kafka.maxRatePerPartition set), maps the DStream into a Dataset and writes this datasets to Parquet files (inside DStream.foreachRDD) at…
3
votes
0 answers

Issue reading multiline kafka messages in Spark

I'm trying to read multiline json message on Spark 2.0.0., but I'm getting _corrupt_record. The code works fine for a single line json and when I'm trying to read the multiline json it as wholetextfile in REPL. stream.map(record => (record.key(),…
3
votes
1 answer

Dispatch and initiate Spark Jobs on Kafka message

I have an external data source which sends the data thru Kafka. As a fact this is not a real data, but links to the data. "type": "job_type_1" "urls": [ "://some_file" "://some_file" ] There is a single topic, but it contains type field basing…
dr11
  • 5,166
  • 11
  • 35
  • 77
3
votes
1 answer

Reading avro messages from Kafka in spark streaming/structured streaming

I am using pyspark for the first time. Spark Version : 2.3.0 Kafka Version : 2.2.0 I have a kafka producer which sends nested data in avro format and I am trying to write code in spark-streaming/ structured streaming in pyspark which will…
3
votes
1 answer

pyspark structured streaming write to parquet in batches

I am doing some transformation on the spark structured streaming dataframe. I am storing the transformed dataframe as parquet files in hdfs. Now I want that the write to hdfs should happen in batches instead of transforming the whole dataframe first…
1
2
3
16 17