Highest Voted 'spark-streaming' Questions

2

votes

1 answer

What happens when I run out of memory to maintain the state with mapWithState

I have a very large number of keys and limited cluster size. I am using mapWithState to update my states. As new data comes in the number of keys increases. When I went to the storage tab of the spark UI MapWithStateRDD is always stored in…

apache-spark spark-streaming

asked Aug 25 '16 at 05:37

Rishi

148
1
7

2

votes

1 answer

Spark aggregateByKey on Dataset

Here's an example of aggregateByKey on mutable.HashSet[String] written by @bbejeck val initialSet = mutable.HashSet.empty[String] val addToSet = (s: mutable.HashSet[String], v: String) => s += v val mergePartitionSets = (p1: mutable.HashSet[String],…

scala apache-spark spark-streaming apache-spark-sql

asked Aug 25 '16 at 04:05

faustineinsun

451
1
6
16

2

votes

0 answers

How do we process/scale variable size batches in Apache Spark Streaming

I am running a spark streaming process where I am getting batch of data after n seconds. I am using repartition to scale the application. Since the repartition size is fixed we are getting lots of small files when batch size is very small. Is…

apache-spark spark-streaming

asked Aug 23 '16 at 08:27

Alchemist

849
2
10
27

2

votes

1 answer

Spark Streaming input rate drop

Running a Spark Streaming job, I have encountered the following behavior more than once. Processing starts well: the processing time for each batch is well below the batch interval. Then suddenly, the input rate drops to near zero. See these…

apache-spark spark-streaming

asked Aug 22 '16 at 11:43

Socci

337
2
12

2

votes

0 answers

SparkStreaming: How to get list like collect()

I am beginner of SparkStreaming. I want to load HBase record at SparkStreaming App. So, I write the the under code by python. My "load_records" function is getting HBase Records and return the records. SparkStreaming can not use collect().…

python apache-spark pyspark spark-streaming

asked Aug 20 '16 at 11:52

penlight

617
10
26

2

votes

1 answer

Excluding hadoop dependency from spark library in sbt file

I am working on spark 1.3.0 . My build.sbt looks as follows: libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.3.0" % "provided", "org.apache.spark" %% "spark-sql" % "1.3.0" % "provided", "org.apache.spark" %%…

hadoop apache-spark sbt spark-streaming hadoop2

asked Aug 17 '16 at 12:47

Alok

1,374
3
18
44

2

votes

1 answer

Best solution to accumulate Spark Streaming DStream

I'm looking for the best solution to accumulate the last N number of messages in a Spark DStream. I'd also like to specify the number of messages to retain. For example, given the following stream, I'd like to retain the last 3 elements: Iteration …

scala apache-spark spark-streaming dstream

asked Aug 15 '16 at 19:19

user278530

83
2
11

2

votes

1 answer

Using Kafka to communicate between long running Spark jobs

I am new to Apache Spark and have a need to run several long-running processes (jobs) on my Spark cluster at the same time. Often, these individual processes (each of which is its own job) will need to communicate with each other. Tentatively, I'm…

scala apache-spark apache-kafka spark-streaming long-running-processes

asked Aug 15 '16 at 14:58

smeeb

27,777
57
250
447

2

votes

2 answers

Counting records of my RDDs in a large Dstream

I am trying to work with a large RDD as read by a file DStream. The code looks as follows: val creatingFunc = { () => val conf = new SparkConf() .setMaster("local[10]") .setAppName("FileStreaming") …

scala apache-spark spark-streaming

asked Aug 12 '16 at 06:00

Mahdi

787
1
8
33

2

votes

2 answers

Using Spark StreamingContext to Consume from Kafka topic

I am brand new to Spark & Kafka and am trying to get some Scala code (running as a Spark job) to act as a long-running process (not just a short-lived/scheduled task) and to continuously poll a Kafka broker for messages. When it receives messages, I…

scala apache-spark apache-kafka spark-streaming

asked Aug 10 '16 at 13:52

smeeb

27,777
57
250
447

2

votes

2 answers

Unable to serialize SparkContext in foreachRDD

I am trying to save the streaming data to cassandra from Kafka. I am able to read and parse the data but when I call below lines to save the data i am getting a Task not Serializable Exception. My class is extending serializable but not sure why i…

scala apache-spark cassandra spark-streaming spark-cassandra-connector

asked Aug 06 '16 at 17:40

Suresh

38,717
16
62
66

2

votes

0 answers

How to stabilize spark streaming application with a handful of super big sessions?

I am running a Spark Streaming application based on mapWithState DStream function . The application transforms input records into sessions based on a session ID field inside the records. A session is simply all of the records with the same ID .…

hadoop apache-spark spark-streaming

asked Aug 03 '16 at 10:04

ZianyD

171
2
12

2

votes

1 answer

Spark 2.0.0 twitter streaming driver is no longer available

During migration from spark 1.6.2 to spark 2.0.0 appeared that package org.apache.spark.streaming.twitter has been removed and twitter streaming is no longer available as well as dependency org.apache.spark …

apache-spark spark-streaming twitter-streaming-api

asked Aug 02 '16 at 07:27

Ivan Shulak

100
6

2

votes

0 answers

Streaming pdf files using spark streaming filestream

I am building an application that scans pdf files and extract data from them. I have already built an application that does batch processing using spark core but now I want the data to be continuously streamed from the directory. How can I use spark…

apache-spark spark-streaming

asked Jul 31 '16 at 11:38

fady zohdy

45
1
8

2

votes

1 answer

Spark History Logs Are Not Enabled with Oozie Spark Action in Cloudera

I am trying to follow this instructions to enable history logs with Spark Oozie action. https://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SparkActionExtension.html To ensure that your Spark job shows up in the Spark History Server, make sure to…

apache-spark spark-streaming hadoop-yarn cloudera oozie

asked Jul 29 '16 at 17:47

Alchemist

849
2
10
27

Questions tagged [spark-streaming]