Highest Voted 'spark-streaming-kafka' Questions

1

vote

0 answers

sparkstreaming kafka cunsumer auto close,

I don't want to use one consumer for all topics, I want to use this method to improve consumption efficiency val kafkaParams = Map( ConsumerConfig.GROUP_ID_CONFIG -> group, ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokers, …

asked Aug 24 '22 at 05:10

1580923067

29
5

1

vote

0 answers

kafka stream with AWS Glue - authorization issue

I want to stream a kafka topic from Glue job but I got the following error StreamingQueryException: Not authorized to access topics: [topic_name] This my current script # Script generated for node Kafka Stream dataframe_KafkaStream_node1 =…

python apache-kafka aws-glue spark-streaming-kafka

asked Aug 12 '22 at 15:12

Smaillns

2,540
1
28
40

1

vote

0 answers

pyspark ml training ALS: No ratings available from MapPartitionsRDD

I'm trying to train ALS with data in each batch from kafka using spark streaming and facing with below error. I think it's because the rating column is negative or something invalid like wrong data type, so I filtered and changed to double it but…

python apache-spark-ml spark-streaming-kafka als

asked Aug 07 '22 at 03:59

Ngọc An

13
5

1

vote

1 answer

Spark speculative tasks and its performance overhead

I am currently exploring spark's speculative tasks option. Below are my configuration which I am planning to use. I am reading the data from kafka and using repartition() I am creating around 200+ tasks in my streaming code. …

apache-spark apache-kafka spark-structured-streaming spark-streaming-kafka

asked Jul 04 '22 at 16:22

Shane

588
6
20

1

vote

1 answer

kafkaUtils.createDirectStream gives an error

I changed a line from createStream to createDirectStream since the new library does not support createStream I have checked it from here https://codewithgowtham.blogspot.com/2022/02/spark-streaming-kafka-cassandra-end-to.html scala> val lines =…

scala apache-spark apache-kafka spark-streaming spark-streaming-kafka

asked May 15 '22 at 11:57

evirac

41
5

1

vote

0 answers

Spark task failure with ClassCastException [C cannot be cast to [J, at org.apache.spark.unsafe.memory.HeapMemoryAllocator.allocate

We have a java spark streaming application which does scd2 operation on deltalake. We were using spark 3.0.0 and delta lake 0.7.0 after upgrading to Spark 3.2.0 and delta 1.1.0, we can see the following exception (under load of 100K events) Caused…

java apache-spark delta-lake spark-streaming-kafka

asked Jan 11 '22 at 09:30

Arun Benoy

11
3

1

vote

1 answer

Apache Spark with kafka stream - Missing Kafka

I have trying to setup the Apache Spark with kafka and wrote simple program in local and its failing and not able figure out from debug. build.gradle.kts implementation ("org.jetbrains.kotlin:kotlin-stdlib:1.4.0") implementation…

apache-spark kotlin spark-structured-streaming spark-streaming-kafka

asked Dec 26 '21 at 04:30

BaskarNatarajan

61
1
7

1

vote

0 answers

Write Spark Stream to Phoenix table

I am trying to figure out how to write a Spark Stream to a Phoenix table in the least convoluted way. So far I have only found this solution: kafka-to-phoenix, which requires some deep ad-hoc engineering (to my noob eyes). I can tailor the linked…

apache-spark hbase spark-structured-streaming apache-phoenix spark-streaming-kafka

asked Dec 13 '21 at 18:53

cyberZamp

149
1
9

1

vote

1 answer

foreach() method with Spark Streaming errors

I'm trying to write data pulled from a Kafka to a Bigquery table every 120 seconds. I would like to do some additional operations which by documentation should be possible inside the .foreach() or foreachBatch() method. As a test i wanted to print a…

pyspark google-bigquery spark-structured-streaming google-cloud-dataproc spark-streaming-kafka

asked Nov 18 '21 at 10:27

nonoDa

413
2
16

1

vote

1 answer

Spark and Kafka: how to increase parallelism for producer sending large batch of records improving network usage?

I am diving to understand how can I send(produce) a large batch of records to a Kafka Topic from Spark. From the docs I can see that there is an attempt to use the same producer across tasks in the same workers. When sending a lot of records at…

apache-spark networking apache-kafka spark-structured-streaming spark-streaming-kafka

asked Nov 10 '21 at 09:57

YFl

845
7
22

1

vote

1 answer

Can I use Airflow to start/stop spark streaming job

I have two type of job: Spark Batch jobs and and Spark streaming jobs. I would like to schedule and manage them both with airflow. Airflow is using for job has stop. But I want to use it for my streaming job. Can anyone give me some idea or other…

apache-spark airflow airflow-scheduler spark-streaming-kafka airflow-2.x

asked Oct 28 '21 at 09:07

Hiệp

37
4

1

vote

1 answer

java.io.IOException: Failed to write statements to batch_layer.test. The latest exception was Key may not be empty

I am trying to count the number of words in the text and save result to the Cassandra database. Producer reads the data from the file and sends it to kafka. Consumer uses spark streaming to read and process the date,and then sends the result of the…

apache-kafka cassandra spark-streaming spark-cassandra-connector spark-streaming-kafka

asked May 08 '21 at 20:22

Миша Попов

127
8

1

vote

0 answers

Spark streaming Dynamic Schema Evolution from Kafka Eventhub on Microbatch

We are streaming data from the Kafka Eventhub. The records may have a nested structure. The schema is inferred dynamically from the data and the Delta table is formed with the schema of the first incoming batch of data. Note: The data read from…

apache-spark pyspark spark-streaming azure-eventhub spark-streaming-kafka

asked Apr 29 '21 at 03:57

Vignesh G

151
6

1

vote

1 answer

Spark Streaming: Read from HBase by received stream keys?

What is the best way to compare received data in Spark Streaming to existing data in HBase? We receive data from kafka as DStream, and before writing it down to HBase we must scan HBase for data based on received keys from kafka, do some calculation…

apache-spark apache-kafka hbase spark-streaming spark-streaming-kafka

asked Feb 08 '21 at 18:14

YFl

845
7
22

1

vote

1 answer

How spark calculates the window start time with given window interval?

Consider I have a input df with a timestamp field column and when setting window duration (with no sliding interval) as : 10 minutes with input of time(2019-02-28 22:33:02) window formed is as (2019-02-28 22:30:02) to (2019-02-28 22:40:02) 8…

apache-spark apache-spark-sql spark-structured-streaming spark-streaming-kafka spark-checkpoint

asked Jan 26 '21 at 14:35

supernatural

1,107
11
34

Questions tagged [spark-streaming-kafka]