Questions tagged [spark-streaming-kafka]

Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.

250 questions
1
vote
1 answer

How to parse a json string column in pyspark's DataStreamReader and create a Data Frame

I am reading messages from a kafka topic messageDFRaw = spark.readStream\ .format("kafka")\ .option("kafka.bootstrap.servers", "localhost:9092")\ .option("subscribe", "test-message")\ …
1
vote
3 answers

How to include kafka timestamp value as columns in spark structured streaming?

I am looking for the solution for adding timestamp value of kafka to my Spark structured streaming schema. I have extracted the value field from kafka and making dataframe. My issue is, I need to get the timestamp field (from kafka) also along with…
1
vote
1 answer

Store Message Offset in Kafka using KafkaUtils.createDirectStream

How to store message offset in Kafka if i am using KafkaUtils.createDirectStream to read the messages. Kafka is losing the offset value every time the application goes down.It is then reading the value provided in auto.offset.reset(which is latest)…
user1326784
  • 627
  • 3
  • 11
  • 31
1
vote
0 answers

Spark streaming slow down

In our spark app we're consuming Kafka stream and storing data to Cassandra DB. First, we've run the stream without backpressure and experienced a weird anomaly where processing time was constant ~ 1 minute, however the scheduling delay was…
Tomas Bartalos
  • 1,256
  • 12
  • 29
1
vote
3 answers

Array of JSON to Dataframe in Spark received by Kafka

I'm writing a Spark application in Scala using Spark Structured Streaming that receive some data formatted in JSON style from Kafka. This application could receive both a single or multiple JSON object formatted in this…
Vinc
  • 35
  • 1
  • 4
1
vote
0 answers

Spark Streaming from Apache Kafka

I came across the following For possible kafkaParams, see Kafka consumer config docs. If your Spark batch duration is larger than the default Kafka heartbeat session timeout (30 seconds), increase heartbeat.interval.ms and session.timeout.ms…
1
vote
1 answer

ExceptionInInitializerError Spark Streaming Kafka

I am trying to connect Spark Streaming to Kafka in a simple application. I created this application by the example from the Spark documentation. When I try to run it I get such an exception: Exception in thread "main"…
Cassie
  • 339
  • 1
  • 3
  • 13
1
vote
0 answers

Spark Streaming Kafka MongoDB time out exception

I am new to spark streaming and trying to implement kafka, MongoDB integration where my code pulls the JSON data from Kafka topic and inserts into MongoDB. Below is my code package com.streams.sparkmongo import…
1
vote
0 answers

Issue Storing offset in Kafka for Spark Streaming Application

In our cluster we have Kafka 0.10.1 and Spark 2.1.0. The spark streaming application works fine with checkpointing mechanism (checkpoints on HDFS). However, we noticed that, using checkpoints the Streaming Application does not restart if there is a…
java_enthu
  • 2,279
  • 7
  • 44
  • 74
1
vote
0 answers

spark-streaming-kafka-0-10: KafkaConsumer is not safe for multi-threaded access

I am trying to move from spark-streaming-kafka-0.8 to spark-streaming-kafka-0.10 and I faced the following error: KafkaConsumer is not safe for multi-threaded access We have multiple kafka clusters in different DCs that I want to consume…
1
vote
2 answers

Kafka createDirectStream in Spark Streaming

I'm trying the example code from Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher). The code can run without any error but I cannot receive any record. If I run kafka-console-consumer.sh --from-beginning, I can get…
1
vote
1 answer

spark-streaming kafka kerberos

I'm working on a poc spark-streaming job pulling from kafka. I am able to use the same code against a non-secured kafka 0.10 cluster however when I switch to run against an ssl/kerberos ( hdp 2.5 ) setup I'm getting an exception: Caused by:…
1
vote
1 answer

spark streaming + kafka sbt compilation

I have an example with spark streaming + kafka. It works well from IDE. but when I try to compile it by SBT from console, like sbt compile. Have got an error. The Main class: val conf = new…
1
vote
1 answer

Spark Streaming + kafka "JobGenerator" java.lang.NoSuchMethodError

I'm new in spark streaming and kafka and I don't understand this runtime exception. I've already setup the kafka server. Exception in thread "JobGenerator" java.lang.NoSuchMethodError:…
1
vote
1 answer

DSE Spark Streaming+Kafka NoSuchMethodError

I am trying to submit a spark streaming + kafka job which just reads lines of string from a kafka topic. However, I am getting the following exception 15/07/24 22:39:45 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job …