Highest Voted 'spark-streaming-kafka' Questions

0

votes

1 answer

Spark Structured Streaming: Output result at the end of Tumbling Window and not the Batch

I want the output of Spark Stream to be sent to the Sink at the end of the Tumbling Window and not at the batch interval. I am reading from a Kafka stream and outputting to another Kafka stream. Code to query and write output is like…

asked Sep 18 '20 at 09:21

Nils

806
1
9
24

0

votes

2 answers

Spark Structured Streaming to read nested Kafka Connect jsonConverter message

I have ingested xml file using KafkaConnect file-pulse connector 1.5.3 Then I want to read it with Spark Streaming to parse/flatten it. As it is quite nested. the string I read out of the kafka (I used the consumer console read this out, and put an…

apache-spark apache-kafka spark-structured-streaming spark-streaming-kafka

asked Sep 12 '20 at 05:40

soMuchToLearnAndShare

977
11
25

0

votes

2 answers

Docker pypspark cluster container not receiving kafka streaming from the host?

I have created and deployed a spark cluster which consist of 4 container running spark master spark-worker spark-submit data-mount-container : to access the script from the local directory i added required dependency jar in all these…

docker apache-spark pyspark apache-kafka spark-streaming-kafka

asked Sep 09 '20 at 07:08

GvrHari

1
3

0

votes

0 answers

Py4JJavaError: Job aborted due to stage failure: Task 0 in stage 460.0 failed 4 times

I am getting this weird error in my spark streaming code written in pyspark. I tried to debug this code but couldn't get any reason Below is my code. Name of the file is Script.py import os from pyspark.sql.types import * import json from pyspark…

apache-spark pyspark spark-streaming spark-streaming-kafka

asked Aug 28 '20 at 08:00

yahoo

183
3
22

0

votes

1 answer

What is the best way to structure a spark structured streaming pipeline?

I'm moving data from my postgres database to kafka and in the middle doing some transformations with spark. I Have 50 tables and for each table i have transformations totally different from the others. So, i want to know how is the best way to…

apache-spark spark-streaming spark-structured-streaming spark-streaming-kafka

asked Aug 08 '20 at 04:52

Luan Carvalho

190
2
10

0

votes

0 answers

Kafka with Spark Streaming works in local but it doesn't work in Standalone mode

I'm trying to use Spark Streaming with a very simple script like this: from pyspark import SparkContext, SparkConf from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils sc =…

apache-spark pyspark apache-kafka spark-streaming spark-streaming-kafka

asked Jul 24 '20 at 10:24

Davide

73
5

0

votes

1 answer

Too many KDC calls from KafkaConsumer on Spark streaming

I have a standalone (master=local for its own reasons) Spark structured streaming application that reads from kerberized kafka cluster. It works functionally, but it makes too many calls to KDC to fetch TGS for each micro-batch execution. Either…

apache-spark spark-streaming kerberos kafka-consumer-api spark-streaming-kafka

asked Jul 22 '20 at 11:24

user3150983

79
1
5

0

votes

2 answers

In Spark, Unable to consume data from Kafka Topic

As I'm new to Spark & Kafka. In Spark facing issues while I'm trying to consume data from Kafka topic. I'm getting the following error. Can somebody help me please? In SBT project added all the dependencies: build.sbt file name :=…

scala apache-spark apache-kafka spark-streaming spark-streaming-kafka

asked Jul 20 '20 at 04:53

PavanChowdary Gudapati

1

0

votes

1 answer

RecordTooLargeException in Spark Structured Streaming

I keep getting this error message: The message is 1169350 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration. As indicated in other StackOverflow posts, I am trying to set…

apache-spark apache-kafka spark-streaming spark-structured-streaming spark-streaming-kafka

asked May 25 '20 at 21:38

DilTeam

2,551
9
42
69

0

votes

1 answer

Spark Structure streaming read data twice per every micro-batch. How to avoid

I have a very strange issue with spark structure streaming. Spark structure streaming creates two spark jobs for every micro-batch. As a result, read data from Kafka twice. Here is a simple code snippet. import org.apache.hadoop.fs.{FileSystem,…

scala apache-spark spark-structured-streaming spark-streaming-kafka

asked Apr 10 '20 at 07:47

Grigoriev Nick

1,099
8
24

0

votes

1 answer

I have a problem when working with readStream().format("kafka")

Help please fix the error: 20/04/09 18:38:44 ERROR MicroBatchExecution: Query [id = 9f3cbbf6-85a8-4aed-89c6-f5d3ff9c40fa, runId = 73c071c6-e222-4760-a750-393666a298af] terminated with error java.lang.ClassCastException:…

java apache-spark apache-kafka spark-structured-streaming spark-streaming-kafka

asked Apr 09 '20 at 15:57

Vladimir

1
1

0

votes

1 answer

Fetch kafka headers in spark 2.4.X

How to get Kafka header fields (which were introduced in Kafka 0.11+) in Spark Structured Streaming? I see the headers implementation is added in Spark 3.0 but not in 2.4.5. And I see by default spark-sql-kafka-0-10 is using kafka-client 2.0. If it…

apache-spark spark-structured-streaming spark-streaming-kafka

asked Mar 09 '20 at 13:15

Kishorekumar Yakkala

311
7
14

0

votes

1 answer

Extracting nested JSON values in Spark Streaming Java

How should I parse json messages from Kafka in Spark Streaming? I'm converting JavaRDD to Dataset and from there extracting the values. Found success in extracting values however I'm not able to extract nested json values such as "host.name" and…

java apache-spark spark-streaming spark-streaming-kafka

asked Mar 04 '20 at 04:19

Gokulraj

450
1
3
20

0

votes

1 answer

Does Kafka Direct Stream create a Consumer group by itself (as it does not care about group.id property given in application)

Let us say I have just launched a Kafka direct stream + spark streaming application. For the first batch, the Streaming Context in the driver program connects to the Kafka and fetches startOffset and endOffset. Then, it launches a spark job with…

apache-spark apache-kafka spark-streaming spark-streaming-kafka

asked Feb 27 '20 at 05:16

Abhilash Reddy

17
5

0

votes

1 answer

Writing data from kafka to hive using pyspark - stucked

I quite new to spark and started with pyspark, I am learning to push data from kafka to hive using pyspark. from pyspark.sql import SparkSession from pyspark.sql.functions import explode from pyspark.sql.functions import * from…

pyspark spark-streaming spark-streaming-kafka

asked Feb 16 '20 at 16:15

Jay

1

Questions tagged [spark-streaming-kafka]