Highest Voted 'spark-streaming' Questions

2

votes

1 answer

[Spark Streaming]How to load the model every time a new message comes in?

In Spark Streaming, every time a new message is received, a model will be used to predict sth based on this new message. But as time goes by, the model can be changed for some reason, so I want to re-load the model whenever a new message comes in.…

asked Oct 17 '16 at 04:31

Zefu Hu

33
4

2

votes

3 answers

Can't access kafka.serializer.StringDecoder

I have added the sbt packages fro kafka and spark streaming as follow: "org.apache.spark" % "spark-streaming_2.10" % "1.6.1", "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.1" however when I wanna use the kafkadirect streams..I cant…

apache-kafka spark-streaming

asked Oct 16 '16 at 23:39

mahdi62

959
2
11
17

2

votes

1 answer

Not all Spark Workers are starting: SPARK_WORKER_INSTANCES

I have my spark-defaults.conf configuration like this. my node has 32Gb RAM. 8 cores. I am planning to use 16gb and 4 workers with each using 1…

hadoop apache-spark spark-streaming

asked Oct 13 '16 at 23:09

AKC

953
4
17
46

2

votes

0 answers

Spark streaming maxRate is violated sometimes

I have a simple Spark Streaming process (1.6.1) which receives data from Azure Event Hub. I am experimenting with back pressure and maxRate settings. This is my configuration: spark.streaming.backpressure.enabled =…

configuration spark-streaming azure-eventhub

asked Oct 13 '16 at 16:03

tmp123

23
4

2

votes

0 answers

Repartitioning a mapwithstateDstream

I am using a mapwithstate function on another filestream and then doing some actions on that..when I trace my application I see that there are just 2 parttions after my mapwithstate function for mapped1 MapWithStateDStream..I wanted to know if I can…

apache-spark spark-streaming

asked Oct 12 '16 at 00:11

mahdi62

959
2
11
17

2

votes

2 answers

Convert Hive Sql to Spark Sql

i want to convert my Hive Sql to Spark Sql to test the performance of query. Here is my Hive Sql. Can anyone suggests me how to convert the Hive Sql to Spark Sql. SELECT split(DTD.TRAN_RMKS,'/')[0] AS TRAB_RMK1, split(DTD.TRAN_RMKS,'/')[1] AS…

apache-spark spark-streaming apache-spark-sql

asked Oct 09 '16 at 09:06

Sree Eedupuganti

440
5
15

2

votes

1 answer

Twitter data from spark

I am learning Twitter integretion with Spark streaming. import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.SparkContext._ import org.apache.spark.streaming.twitter._ import…

scala twitter apache-spark spark-streaming

asked Oct 09 '16 at 06:09

subho

491
1
4
13

2

votes

2 answers

Spark streaming with kafka - restarting from checkpoint

We are building a fault tolerant system using Spark Streaming and Kafka and are testing checkpointing spark streaming to give us the option of restarting the spark job if it crashes for any reason. Here's what our spark process looks like: Spark…

apache-spark spark-streaming

asked Oct 06 '16 at 20:54

Shay

505
1
3
19

2

votes

1 answer

Specifying a timeout with mapWithState in Spark Streaming

I am following a sample of mapWithState function on Databricks website. The codes for trackstatefunction is as follow: def trackStateFunc(batchTime: Time, key: String, value: Option[Int], state: State[Long]): Option[(String, Long)] = { val sum =…

scala apache-spark spark-streaming

asked Oct 04 '16 at 04:49

mahdi62

959
2
11
17

2

votes

0 answers

Stateful stream processing with Spark DataFrames

Is it possible to achieve stateful stream processing with Spark DataFrame API? The first thing I'd like to try is deduplicate the stream. DStream has mapWithState method, but in order to convert it to DataFrames, I have to use foreachRDD: dStream…

scala apache-spark spark-streaming apache-spark-sql

asked Oct 03 '16 at 13:05

lizarisk

7,562
10
46
70

2

votes

0 answers

How to enable dynamic repartitioning in Spark Streaming for uneven data load

I have a use case where input stream data is skewed, volume of data can be from 0 events to 50,000 events per batch. Each data entry is independent of others. Therefore to avoid shuffle caused by repartitioning I want to use some kind of dynamic…

apache-spark spark-streaming

asked Oct 01 '16 at 02:13

Alchemist

849
2
10
27

2

votes

1 answer

Debug, Warn & Info messages from non-main class not visible in spark executor logging

We've tried a variety of solutions including changing the log4j.properties file, copying the file to the executors via --file and then telling them to use it as an arg passed to spark via --conf and also tried updating the configuration of the EMR…

scala logging apache-spark spark-streaming

asked Sep 30 '16 at 09:13

null

3,469
7
41
90

2

votes

1 answer

Spark Streaming: How to load a Pipeline on a Stream?

I am implementing a lambda architecture system for stream processing. I have no issue creating a Pipeline with GridSearch in Spark Batch: pipeline = Pipeline(stages=[data1_indexer, data2_indexer, ..., assembler, logistic_regressor]) paramGrid =…

apache-spark pyspark spark-streaming apache-spark-mllib

asked Sep 29 '16 at 17:33

Manuel G

1,523
1
21
34

2

votes

2 answers

Submitting Spark Job On Scheduler Pool

I am running a spark streaming job on cluster mode , i have created a pool with memory of 200GB(CDH). I wanted to run my spark streaming job on that pool, i tried setting sc.setLocalProperty("spark.scheduler.pool", "pool") in code but its not…

apache-spark spark-streaming cloudera-cdh job-scheduling

asked Sep 29 '16 at 13:43

Justin

735
1
15
32

2

votes

0 answers

Failure to reload from checkpoint directory

When I tried reloading my spark streaming application from a checkpoint directory, I got the following exception: java.lang.IllegalArgumentException: requirement failed: Checkpoint directory does not exist:…

spark-streaming reload illegalargumentexception checkpointing

asked Sep 29 '16 at 04:09

mahdi62

959
2
11
17

Questions tagged [spark-streaming]