Questions tagged [spark-streaming-kafka]

Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.

250 questions
3
votes
1 answer

How to optimize number of executor instances in spark structured streaming app?

Runtime YARN cluster mode Application Spark structured streaming Read data from Kafka topic About Kafka topic 1 topic with 4 partitions -for now. (number of partitions can be changed) Added 2000 records maximum in topic per 1 second. I've found…
3
votes
1 answer

Spark streaming and kafka Missing required configuration "partition.assignment.strategy" which has no default value

I am trying to run the spark streaming application with Kafka using yarn. I am getting the following Stack trace error- Caused by: org.apache.kafka.common.config.ConfigException: Missing required configuration "partition.assignment.strategy" which…
3
votes
2 answers

How to read from specific Kafka partition in Spark structured streaming

I have three partitions for my Kafka topic and I was wondering if I could read from just one partition out of three. My consumer is spark structured streaming application. Below is my existing kafka settings in spark. val inputDf =…
3
votes
1 answer

Spark Kafka streaming in spark 2.3.0 with python

I recently upgraded to Spark 2.3.0. I had a existing spark job which used to run on spark 2.2.0. I am facing the Java Exception of AbstractMethodError My simple code: from pyspark import SparkContext …
3
votes
1 answer

Spark Streaming Kafka Stream batch execution

I'm new in spark streaming and I have a general question relating to its usage. I'm currently implementing an application which streams data from a Kafka topic. Is it a common scenario to use the application to run a batch only one time, for…
3
votes
0 answers

Spark Streaming - Kafka - java.nio.BufferUnderflowException

I'm running into below error while trying to consume message from Kafka through Spark streaming (Kafka direct API). This used to work OK when using Spark standalone cluster manager. We just switched to using Cloudera 5.7 using Yarn to manage Spark…
3
votes
1 answer

Spark Streaming Kafka createDirectStream - Spark UI shows input event size as zero

I have implemented Spark Streaming using createDirectStream. My Kafka producer is sending several messages every second to a topic with two partitions. On Spark streaming side, i read kafka messages every second and them I'm windowing them on 5…
2
votes
1 answer

Spark not giving equal tasks to all executors

I am reading from kafka topic which has 5 partitions. Since 5 cores are not sufficient to handle the load, I am doing repartitioning the input to 30. I have given 30 cores to my spark process with 6 cores on each executor. With this setup i was…
2
votes
0 answers

Spark structured streaming how to write to Kafka in Protobuf format

Spark: 3.0.0 Scala: 2.12 confluent I am having spark structured streaming job and looking for an example for writing data frames to Kafka in Protbuf format. I read messages from PostgreSQL and after doing all the transformations have a data frame…
2
votes
1 answer

Right way to read stream from Kafka topic using checkpointLocation offsets

I'm trying to develop a small Spark app (using Scala) to read messages from Kafka (Confluent) and write them (insert) into Hive table. Everything works as expected, except for one important feature - managing offsets when the application is…
2
votes
1 answer

Does Spark Structured Streaming have some timeout issue when reading streams from a Kafka topic?

I implemented a spark job to read stream from a kafka topic with foreachbatch in the structured streaming. val df = spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "mykafka.broker.io:6667") .option("subscribe",…
2
votes
1 answer

Driver stops executors without a reason

I have an application based on spark structured streaming 3 with kafka, which is processing some user logs and after some time the driver is starting to kill the executors and I don't understand why. The executors doesn't contain any errors. I'm…
2
votes
1 answer

Why a new batch is triggered without getting any new offsets in streaming source?

I have multiple spark structured streaming jobs and the usual behaviour that I see is that a new batch is triggered only when there are any new offsets in Kafka which is used as source to create streaming query. But when I run this example which…
2
votes
1 answer

kafka kafka-consumer-groups.sh --describe returns no output for a consumer group

kafka version 1.1 --list can get the consumers group bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list --command-config config/client_security.properties Note: This will not show information about old Zookeeper-based…
2
votes
3 answers

How do I convert a dataframe to JSON and write to kafka topic with key

I'm trying to write a dataframe to kafka in JSON format and add a key to the data frame in Scala, i'm currently working with this sample from the kafka-spark: df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)") .write …
1 2
3
16 17