Highest Voted 'spark-streaming' Questions

2

votes

1 answer

Store algebird Bloom Filter with Storehaus

I have a Spark job whose final output is an Algebird bloom filter, and I'd need to reuse this bloom filter in another Spark job. Is there a way to store this bloom filter in a kv store (eg: redis) using Twitter Storehaus and retrieve it in the…

asked Jul 28 '16 at 14:52

arnaud briche

1,479
3
20
25

2

votes

1 answer

java.io.NotSerializableException in Spark Streaming with enabled checkpointing

code below: def main(args: Array[String]) { val sc = new SparkContext val sec = Seconds(3) val ssc = new StreamingContext(sc, sec) ssc.checkpoint("./checkpoint") val rdd = ssc.sparkContext.parallelize(Seq("a","b","c")) val…

apache-spark spark-streaming rdd

asked Jul 22 '16 at 09:13

Guo

1,761
2
22
45

2

votes

1 answer

Why spark streaming executors start at different time?

I'm using Spark streaming 1.6 which uses kafka as a source My input arguments are as follows: num-executors 5 num-cores 4 batch Interval 10 sec maxRate 600 blockInterval 350 ms Why does some of my executors start later than…

apache-spark spark-streaming timeline

asked Jul 14 '16 at 16:00

Vadym B.

681
7
21

2

votes

1 answer

Spark streaming print on received stream

What I am trying ot achieve is basically print "hello world" each time I receive a stream of data. I know that on each stream I can call the function foreachRDD but that does not help me because: It might be that there is no data processed I don't…

scala apache-spark stream spark-streaming

asked Jul 08 '16 at 17:31

Kevin Cohen

1,211
2
15
22

2

votes

1 answer

Does Apache Storm have machine learning libraries like with Apache Spark?

I am comparing Apache Storm and Apache Spark streaming for choosing a distributed realtime computation system. There are already lots of discussion giving comparisons between these two technologies for instance…

apache-storm spark-streaming apache-spark-mllib

asked Jul 08 '16 at 13:04

Yassir S

1,032
3
21
44

2

votes

1 answer

error: value succinct is not a member of org.apache.spark.rdd.RDD[String]

I am trying out succinctRDD for searching mechanism. Below is what I am trying as per the doc: import edu.berkeley.cs.succinct.kv._ val data = sc.textFile("file:///home/aman/data/jsonDoc1.txt") val succintdata = data.succinct.persist() The link…

search apache-spark spark-streaming

asked Jul 08 '16 at 10:43

Amaresh

3,231
7
37
60

2

votes

0 answers

Is the operation inside foreachRDD supposed to be blocking?

In a Spark Streaming job, is the operation inside foreachRDD supposed to synchronous / blocking? What if you do some asynchronous operation which returns a Future? Are you then supposed to do Await on that Future? Note: This question is specifically…

scala apache-spark spark-streaming

asked Jul 06 '16 at 16:40

Mikael Ståldal

374
1
3
11

2

votes

1 answer

Using futures in Spark-Streaming & Cassandra (Scala)

I am rather new to spark, and I wonder what is the best practice when using spark-streaming with Cassandra. Usually, when performing IO, it is a good practice to execute it inside a Future (in Scala). However, a lot of the spark-cassandra-connector…

scala apache-spark cassandra spark-streaming

asked Jul 03 '16 at 09:03

EranM

303
1
3
14

2

votes

0 answers

How to test Spark streaming code

I have a class that pulls in RDDs from a Flume stream. I'd like to test it by having the test populate the stream. I thought using the queueStream method on StreamingContext would work but I'm running into problems: I get NullPointerExceptions…

java scala testing apache-spark spark-streaming

asked Jul 01 '16 at 22:25

s d

2,666
4
26
42

2

votes

0 answers

Better method of connecting spark streamming, sockets and rabbitMQ

To get around the trouble of consuming rabbit messages directly in spark streaming, I decided to consume messages using pika (python adaptor) and send them using sockets with the aim of getting spark streaming to communicate with the send data via…

python sockets apache-spark spark-streaming

asked Jun 29 '16 at 17:30

disruptive

5,687
15
71
135

2

votes

1 answer

What is a supported streaming datasource to persist result?

I'm trying to use the new streamed writing feature with spark 2.0.1-SNAPSHOT. which output datasource are actually supported to persist the results? I was able to display the output on console with something like this: Dataset testData =…

java apache-spark apache-spark-sql spark-streaming apache-spark-dataset

asked Jun 29 '16 at 10:44

Paolo

21
1

2

votes

0 answers

Spark gives a StackOverflowError when training using FPGrowth

I am using the FPGrowth in sparks's mllib to find frequent patterns. Here is my code: object FPGrowthExample{ def main(args:Array[String]){ val conf = new SparkConf().setAppName("FPGrowthExample") val sc = new SparkContext(conf) …

scala spark-streaming apache-spark-mllib

asked Jun 29 '16 at 07:47

chenqun

21
2

2

votes

1 answer

Spark (streaming) RDD foreachPartitionAsync functionality/working

I will come to actual question but please bear with my use-case first. I have the following use-case, say I got rddStud from somewhere: val rddStud: RDD[(String,Student)] = ??? Where 'String' - some random string and 'Student' - case class…

scala apache-spark spark-streaming rdd

asked Jun 28 '16 at 00:39

K P

861
1
8
25

2

votes

1 answer

Saving protobuf in Hbase/HDFS using Spark streaming

I am looking to store the protobuf messages in Hbase/HDFS using spark streaming. And I have below two questions What is the efficient way of storing huge number of protobuf messages and the efficient way of retrieving them to do some analytics? For…

apache-spark hbase hdfs protocol-buffers spark-streaming

asked Jun 22 '16 at 09:10

Lokesh Kumar P

369
5
20

2

votes

1 answer

Spark streaming from Kafka one task lags behind causing the whole batch to slow down

I have a spark streaming application that reads data from Kafka through network. It is important to note that the cluster and the Kafka servers are in different geographies. The average time to complete a job is around 8-10 minutes (I am running 10…

apache-spark apache-kafka spark-streaming

asked Jun 22 '16 at 05:35

Sohaib

4,556
8
40
68

Questions tagged [spark-streaming]