Questions tagged [spark-streaming-kafka]

Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.

250 questions
0
votes
0 answers

Distinct operation in spark structured streaming with a window operation

I want to implement a distinct operation in Spark structured code. I have already watermarked it and windowed it, but still, Spark is not able to execute it. FYI - Distinct comes under the list of unsupported operations in Spark streaming, but I…
0
votes
1 answer

Not able to read data through kafka spark streaming in pyspark

I am working on creating a basic streaming app which reads streaming data from kafka and process the data. Below is the code I am trying in pyspark spark = SparkSession.builder.appName("testing").getOrCreate() df = spark \ .readStream…
Nikhil
  • 101
  • 2
  • 13
0
votes
1 answer

Spark structured streaming job not processing stages and showing in hung state

I am running one streaming application and processing data from Kafka to Kafka using spark. If i am using latest then its working as expected and running without any issue. but in source we have done bulk transaction (200 000) and using earliest…
0
votes
1 answer

offset management in spark streaming

As far as i understand,for a spark streaming application(structured streaming or otherwise),to manually manage the offsets ,spark provides the feature of checkpointing where you just have to configure the checkpoint location(hdfs most of the times)…
0
votes
0 answers

Problem integrating kafka and spark streaming no messages received in spark streaming

My spark streaming context is successfully subscribed to my kafka topic where my tweets are streamed using my twitter producer.But no messages is being streamed from topic in my spark streaming! Here is my code def main(args: Array[String]){ val…
0
votes
0 answers

spark-streaming-kafka-0-10 does not support message handler

My use case is to print offset number , partition , topic for each record that has been read from kafka from a spark streaming application. currently my code to create discrete stream looks like this. val stream: InputDStream[ConsumerRecord[String,…
amarnath harish
  • 945
  • 7
  • 24
0
votes
0 answers

How to use environment variables in spark which deployed on cluster mode?

When I set environment variable using Intellij below code works, but when i deploy code with spark-submit it does not work since environment variables are not exits on entire cluster. import com.hepsiburada.util.KafkaUtil import…
0
votes
1 answer

Spark Stucture streaming processing already processed record on failure

I am stuck with very weird issue in spark structure streaming. Whenever I am shutting down the stream and restart again it again process already processed record. I tried to use spark.conf.set("spark.streaming.stopGracefullyOnShutdown", True) but…
0
votes
0 answers

Spark streaming provides 2 kind of streams when integrating with kafka 1) Receiver Based 2) Direct What kind of steam structured streaming uses

Spark streaming provides 2 kind of streams when integrating with kafka Receiver Based Direct What kind of stream structured streaming uses when we do spark.readstream.format("kafka")?
0
votes
1 answer

Spark Structured Streaming from Kafka to Elastic Search

I want to write a Spark Streaming Job from Kafka to Elasticsearch. Here I want to detect the schema dynamically while reading it from Kafka. Can you help me to do that.? I know, this can be done in Spark Batch Processing via below line. val schema =…
0
votes
1 answer

Available options for a source/destination format of Spark structured streaming

When we use DataStreamReader API for a format in Spark, we specify options for the format used using option/options method. For example, In the below code, I'm using Kafka as the source and passing the configuration required for the source through…
0
votes
1 answer

How do I get the data of one row of a Structured Streaming Dataframe in pyspark?

I have a Kafka broker with a topic connected to Spark Structured Streaming. My topic sends data to my streaming dataframe, and I'd like to get information on each row for this topic (because I need to compare each row with another database). If I…
0
votes
1 answer

How to merge multiple datatypes of an union type in avro schema to show one data type in the value field instead of member0 member1

I have the following avro schema { "name": "MyClass", "type": "record", "namespace": "com.acme.avro", "fields": [ { "name": "data", "type": { "type": "map", "values": ["int","string"] } } …
0
votes
0 answers

How to configure thread count on Spark Driver node?

We are running spark streaming job in stand-alone cluster mode with deploy mode as the client. This streaming job polls messages from kafka topic periodically, and the logs generated at the driver node is flushed to a txt file. After running…
0
votes
0 answers

Pivot stream data in spark

I am reading data from Kafka topic and I want to pivot the data, I am using the below code in spark shell import org.apache.spark.sql.types._ import org.apache.spark.sql.functions._ val data = spark.readStream.format("kafka") …