Spark Streaming is an extension of the core Apache Spark API that enables high-throughput, fault-tolerant stream processing of live data streams. From the version 1.3.0, it supports exactly-once processing semantics, even in face of failures.
Questions tagged [spark-streaming]
5565 questions
2
votes
1 answer
Architecture of a real time streaming job
I am working on an streaming application using Spark Streaming, I want to index my data into elastic search.
My analysis:
I can directly push data from Spark to elastic search, but i feel in this case both the components will be tightly coupled.
If…

kushagra mittal
- 343
- 5
- 17
2
votes
2 answers
Spark: java.io.NotSerializableException
I want to pass path to the function saveAsTextFile that runs in Spark Streaming. However, I get the java.io.NotSerializableException. Usually in similar cases I use a skeleton object, but in this particular case I don't know how to solve the issue.…

Lobsterrrr
- 325
- 1
- 5
- 15
2
votes
0 answers
Issue with the Spark Job Tuning using the "executor-memory" parameter
We have 3 compute nodes with 8 cores each and are assigned 30 GB of RAM in our cluster and we are executing performance tests in order to get the optimal performance.
The optimal performance was achieved by considering the following…

Sumit Khurana
- 159
- 1
- 10
2
votes
0 answers
spark streaming slow checkpointing when calling Rserve
I wrote a spark streaming application in Java. It reads stock trades off of a data feed receiver and converts them to Tick objects, and uses a microbatch interval, window interval and sliding interval of 10 seconds. A JavaPairDStream

Mark
- 21
- 2
2
votes
0 answers
Read Spark Streaming checkpoint data
I'm writing a Spark Streaming application reading from Kafka. In order to have an exactly one semantic, I'd like to use the direct Kafka stream and using Spark Streaming native checkpointing.
The problem is that checkpointing makes pratically…

mgaido
- 2,987
- 3
- 17
- 39
2
votes
1 answer
How to stop a long running spark streaming step in AWS EMR
I use AWS EMR for our spark streaming. I add a step in EMR that reads data from Kinesis stream. What I need is an approach to stop this step and add a new one.
Right now I spawn a thread from the Spark driver and listen to an SQS queue for a message…

Aravindh S
- 1,185
- 11
- 19
2
votes
1 answer
Apache Zeppelin 0.6.1: Run Spark 2.0 Twitter Stream App
I have a cluster with Spark 2.0 and Zeppelin 0.6.1 installed. Since the class TwitterUtils.scala is moved from Spark project to Apache Bahir, I can't use the TwitterUtils in my Zeppelin notebook anymore.
Here the snippets of my notebook:
Dependency…

D. Müller
- 3,336
- 4
- 36
- 84
2
votes
0 answers
How to manage failover of Spark Streaming process with Kafka
I am using Spark's direct streaming API to get Spark streaming batches. But since the direct streaming API does not synchronize offsets with Zookeeper, we loose the events that were received while spark streaming application was down. I want to…

Alchemist
- 849
- 2
- 10
- 27
2
votes
0 answers
Spark streaming 2.0.0 - freezes after several days under load
We are running on AWS EMR 5.0.0 with Spark 2.0.0.
Consuming from a 125 shard Kinesis stream.
Feeding 19k events/s using 2 message producers, each message about 1k in size.
Consuming using a cluster of 20 machines.
The code has a flatMap(),…

visitor
- 91
- 4
2
votes
1 answer
How to run multiple actions on same Spark Streaming
I am using Spark-streaming along with RabbitMQ. So, streaming job fetched the data from RabbitMQ and apply some transformation and actions. So, I want to know how to apply multiple actions (i.e. calculate two different feature sets) on the same…

Naresh
- 5,073
- 12
- 67
- 124
2
votes
0 answers
Sending only one message per broker on kafka cluster
I am using a single cluster multi-broker kafka cluster with spark streaming to fetch data. the problem i am facing is under one topic same message get sent across all brokers.
How do I limit kafka to send only one message per broker so that there…

Anand K
- 21
- 2
2
votes
1 answer
SQL Stored Procedure to Scala/Spark Streaming
I am currently working on transferring an archaic system, written mostly in SQL stored procedures, to Scala to run on Spark. The stored procedures are batch jobs run once per day/week/moth/year, on "Request" objects, that can take hours to run.
Do…

terminatur
- 628
- 1
- 6
- 21
2
votes
2 answers
Spark Streaming "ERROR JobScheduler: error in job generator"
I build a spark Streaming application to keep receiving messages from Kafka and then write them into a table HBase.
This app runs pretty good for first 25 mins. When I input KV pairs like 1;name1, 2;name2 in Kafka-console-producer, they are able to…

Frank Kong
- 1,010
- 1
- 20
- 32
2
votes
0 answers
How to have the log file in local using logback in spark streaming application running in yarn
the logs are not getting created in the file mentioned below but they are printing in the console when the application is runnning. please help me in the settings iam new to spark and logging concept.the path given below is in the server in which…

vivek
- 71
- 8
2
votes
1 answer
NotSerializableException: DStream checkpointing has been enabled but the DStreams with their functions are not serializable
I encountered the following exception:
Exception in thread "main" java.io.NotSerializableException: DStream checkpointing has been enabled but the DStreams with their functions are not serializable
I have enabled checkpoting outside, and use this…

Jaming LAM
- 141
- 3
- 8