Highest Voted 'spark-streaming-kafka' Questions

0

votes

0 answers

Distinct operation in spark structured streaming with a window operation

I want to implement a distinct operation in Spark structured code. I have already watermarked it and windowed it, but still, Spark is not able to execute it. FYI - Distinct comes under the list of unsupported operations in Spark streaming, but I…

asked Sep 07 '22 at 06:14

Deval

97
7

0

votes

1 answer

Not able to read data through kafka spark streaming in pyspark

I am working on creating a basic streaming app which reads streaming data from kafka and process the data. Below is the code I am trying in pyspark spark = SparkSession.builder.appName("testing").getOrCreate() df = spark \ .readStream…

pyspark spark-streaming databricks spark-streaming-kafka

asked Aug 29 '22 at 12:07

Nikhil

101
2
13

0

votes

1 answer

Spark structured streaming job not processing stages and showing in hung state

I am running one streaming application and processing data from Kafka to Kafka using spark. If i am using latest then its working as expected and running without any issue. but in source we have done bulk transaction (200 000) and using earliest…

apache-spark apache-kafka spark-streaming spark-structured-streaming spark-streaming-kafka

asked May 17 '22 at 18:46

Sonu

77
11

0

votes

1 answer

offset management in spark streaming

As far as i understand,for a spark streaming application(structured streaming or otherwise),to manually manage the offsets ,spark provides the feature of checkpointing where you just have to configure the checkpoint location(hdfs most of the times)…

apache-spark apache-kafka spark-streaming spark-streaming-kafka spark-checkpoint

asked May 15 '22 at 19:43

Gaurav Gupta

159
1
17

0

votes

0 answers

Problem integrating kafka and spark streaming no messages received in spark streaming

My spark streaming context is successfully subscribed to my kafka topic where my tweets are streamed using my twitter producer.But no messages is being streamed from topic in my spark streaming! Here is my code def main(args: Array[String]){ val…

scala apache-spark apache-kafka spark-streaming spark-streaming-kafka

asked May 04 '22 at 07:04

bigdata1800

11
1

0

votes

0 answers

spark-streaming-kafka-0-10 does not support message handler

My use case is to print offset number , partition , topic for each record that has been read from kafka from a spark streaming application. currently my code to create discrete stream looks like this. val stream: InputDStream[ConsumerRecord[String,…

scala apache-spark spark-streaming-kafka

asked Apr 21 '22 at 09:55

amarnath harish

945
7
24

0

votes

0 answers

How to use environment variables in spark which deployed on cluster mode?

When I set environment variable using Intellij below code works, but when i deploy code with spark-submit it does not work since environment variables are not exits on entire cluster. import com.hepsiburada.util.KafkaUtil import…

apache-spark environment-variables spark-streaming spark-streaming-kafka

asked Apr 17 '22 at 15:01

Enes Uğuroğlu

377
5
16

0

votes

1 answer

Spark Stucture streaming processing already processed record on failure

I am stuck with very weird issue in spark structure streaming. Whenever I am shutting down the stream and restart again it again process already processed record. I tried to use spark.conf.set("spark.streaming.stopGracefullyOnShutdown", True) but…

apache-spark spark-streaming spark-structured-streaming azure-eventhub spark-streaming-kafka

asked Mar 29 '22 at 05:46

Deepak

31
3

0

votes

0 answers

Spark streaming provides 2 kind of streams when integrating with kafka 1) Receiver Based 2) Direct What kind of steam structured streaming uses

Spark streaming provides 2 kind of streams when integrating with kafka Receiver Based Direct What kind of stream structured streaming uses when we do spark.readstream.format("kafka")?

apache-spark apache-kafka spark-streaming-kafka

asked Mar 22 '22 at 10:05

Abhinav Kumar

210
3
13

0

votes

1 answer

Spark Structured Streaming from Kafka to Elastic Search

I want to write a Spark Streaming Job from Kafka to Elasticsearch. Here I want to detect the schema dynamically while reading it from Kafka. Can you help me to do that.? I know, this can be done in Spark Batch Processing via below line. val schema =…

apache-spark spark-streaming-kafka spark-kafka-integration elasticsearch-spark

asked Nov 22 '21 at 14:06

Siva Samraj

37
1
5

0

votes

1 answer

Available options for a source/destination format of Spark structured streaming

When we use DataStreamReader API for a format in Spark, we specify options for the format used using option/options method. For example, In the below code, I'm using Kafka as the source and passing the configuration required for the source through…

apache-spark pyspark spark-structured-streaming spark-streaming-kafka

asked Nov 14 '21 at 04:45

Scarface

359
2
13

0

votes

1 answer

How do I get the data of one row of a Structured Streaming Dataframe in pyspark?

I have a Kafka broker with a topic connected to Spark Structured Streaming. My topic sends data to my streaming dataframe, and I'd like to get information on each row for this topic (because I need to compare each row with another database). If I…

dataframe pyspark spark-structured-streaming spark-streaming-kafka discretization

asked Oct 15 '21 at 12:14

Donsitoz

19
5

0

votes

1 answer

How to merge multiple datatypes of an union type in avro schema to show one data type in the value field instead of member0 member1

I have the following avro schema { "name": "MyClass", "type": "record", "namespace": "com.acme.avro", "fields": [ { "name": "data", "type": { "type": "map", "values": ["int","string"] } } …

apache-spark pyspark avro spark-structured-streaming spark-streaming-kafka

asked Oct 08 '21 at 10:37

Beluga

63
7

0

votes

0 answers

How to configure thread count on Spark Driver node?

We are running spark streaming job in stand-alone cluster mode with deploy mode as the client. This streaming job polls messages from kafka topic periodically, and the logs generated at the driver node is flushed to a txt file. After running…

apache-spark spark-streaming spark-streaming-kafka apache-spark-standalone

asked Sep 18 '21 at 16:32

Anoop Deshpande

514
1
6
23

0

votes

0 answers

Pivot stream data in spark

I am reading data from Kafka topic and I want to pivot the data, I am using the below code in spark shell import org.apache.spark.sql.types._ import org.apache.spark.sql.functions._ val data = spark.readStream.format("kafka") …

scala apache-spark apache-spark-sql spark-structured-streaming spark-streaming-kafka

asked Aug 16 '21 at 05:08

vishnupriya

55
9

Questions tagged [spark-streaming-kafka]