Questions tagged [spark-streaming-kafka]

Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.

250 questions
1
vote
0 answers

How to store data into HDFS using spark streaming

I want to store streaming data into hdfs. Its a spark streaming code capture data from kafka topic. I tried this lines.saveAsHadoopFiles("hdfs://192.168.10.31:9000/user/spark/mystream/", "abc") this is my code let me know here to write code for…
1
vote
1 answer

Spark arbitrary stateful stream aggregation, flatMapGroupsWithState API

10 days old spark developer, trying to understand the flatMapGroupsWithState API of spark. As I understand: We pass 2 options to it which are timeout configuration. A possible value is GroupStateTimeout.ProcessingTimeTimeout i.e. kind of an…
1
vote
0 answers

How to read json data using python from kafka topic in apache spark

I am new spark, Could you please let me know how to read json data using python from kafka topic in apache spark. I tried following Code which worked for single line json but not for multiline json *kafkaStream = KafkaUtils.createStream(ssc,…
1
vote
0 answers

SparkStreaming tasks fails with NullPointerException

I'm running spark job with kafka-clients:2.2.1 and spark-streaming-kafka-0-10_2.11:2.4.3 The job is running in the following mode: val scc: StreamingContext = new StreamingContext(spark, "180 seconds") val kafkaSink: Broadcast[KafkaSink] = ... val…
Julias
  • 5,752
  • 17
  • 59
  • 84
1
vote
0 answers

Spark streaming context not starting

I'm trying to read data from Kafka topic using Spark streaming, Below is the code and libraries am using. while code looks fine but the ssc.start() gets hanged without printing any ERROR or INFO. Any pointers for issue will be a great…
1
vote
1 answer

How to fix logging and version compatibility in spark-submit of jar file

I'm trying to submit a jar file for execution on spark engine. I'm trying to integrate spark with kafka and using eclipse to build and export the jar file of the sample code https://github.com/apache/spark/tree/v2.1.1/examples I got two exceptions:…
1
vote
1 answer

spark-submit error Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource

In my spark program, I have this code: val df = spark.readStream .format("kafka") .option("subscribe", "raw_weather") .option("kafka.bootstrap.servers", "s of my brokers") .option("kafka.security.protocol",…
1
vote
1 answer

Unable to read the data from kafka topics using spark streaming

I am trying to read the data from the kafka topic using the spark streaming. I am able to produce the message into the kafka topic, but whilereading the data from topic using spark streaming i am getting error message as given below: ERROR…
1
vote
0 answers

Spark Streaming readStream unable to read from secure Kafka (EventStreams)

I'm trying to send the data from a program to a secure Kafka cluster (EventStreams on IBM Cloud - Cloud Foundry Services), then in my consumer application (which is spark streaming), I'm trying to read the data from the same kafka source. Here are…
1
vote
1 answer

Kafka - Spark Streaming Integration: DStreams and Task reuse

I am trying to understand the internals of Spark Streaming (not Structured Streaming), specifically the way tasks see the DStream. I am going over the source code of Spark in scala, here. I understand the call stack: ExecutorCoarseGrainedBackend…
1
vote
0 answers

What could be reasons for spark job to stuck

I'm running Spark 2.3.1 standalone cluster. My job is consuming from Kafka mini batches every 2 minutes and writing aggregation to some store. Job looks like as following: val stream = KafkaUtils.createDirectStream(...) stream.foreachRDD { rdd => …
Julias
  • 5,752
  • 17
  • 59
  • 84
1
vote
0 answers

ERROR Utils: Uncaught exception in thread stdout writer for python

I use spark 2.4.0 using python. and read data from the kafka_2.11-2.0.0 (binary not source). I m using spark-submit --jars sspark-streaming-kafka-0-8-assembly_2.11-2.4.0.jar script.py an error message appears in the error report, if any one can help…
1
vote
1 answer

How to create connection(s) to a Datasource in Spark Streaming for Lookups

I have a use case where we are streaming events and for each event I have to do some lookups. The Lookups are in Redis and I am wondering what is the best way to create the connections. The spark streaming would run 40 executors and I have 5 such…
1
vote
0 answers

Spark interaction with Kafka with two different principals

I have the following question. I'm using a Spark Structured Streaming job which reads from one topic and writes to another topic of the same kerberized Kafka cluster. Everything works super. But my problem is the following: How would I handle the…
Neven
  • 123
  • 6
1
vote
2 answers

Deserialize Kafka json message with PySpark Streaming

I have a pyspark application that is consuming messages from a Kafka topic, these messages are serialized by org.apache.kafka.connect.json.JsonConverter. I'm using confluent Kafka JDBC connector to do this The issue is, when I consume the messages,…