Highest Voted 'spark-streaming' Questions

2

votes

1 answer

Trying to understand how Spark Streaming works?

This might be a stupid question, but I can't seem to find any doc clarifying this in pure English (ok, exaggerated), and after reading the official doc and some blogs, I'm still confused about how driver and executors work. Here is my current…

python spark-streaming

asked Mar 31 '17 at 10:25

avocado

2,615
3
24
43

2

votes

1 answer

H2O Spark streaming 2.1 distribution

I have been intermittently getting distribution error when running a sample IRIS model in sparkling water. Sparkling water: 2.1 Spark streaming kafka - 0.10.0.0 Running locally using spark submit - Only master DistributedException from xxx:54321,…

spark-streaming h2o sparkling-water

asked Mar 30 '17 at 14:00

Lalit Agarwal

2,354
1
14
18

2

votes

1 answer

Stateful streaming Spark processing

I'm learning Spark and trying to build a simple streaming service. For e.g. I have a Kafka queue and a Spark job like words count. That example is using a stateless mode. I'd like to accumulate words counts so if test has been sent a few times in…

scala apache-spark spark-streaming

asked Mar 26 '17 at 23:21

kikulikov

2,512
4
29
45

2

votes

1 answer

How to calculate z-score on dataframe API in ApacheSpark stucured streaming?

I'm currently struggling with the following: z-score is defined as: z = (x-u)/sd (where x is the individual value, u the mean of the window and sd the standard deviation of the window) I can calculate u and sd on the window but don't know how to…

scala apache-spark statistics time-series spark-streaming

asked Mar 25 '17 at 08:13

Romeo Kienzler

3,373
3
36
58

2

votes

2 answers

How to spark-submit a Spark Streaming application with spark-streaming-kafka-0-8 dependency?

I am tring to run the spark streaming example : Directkafkawordcount.scala To create jar I am using "build.sbt" with plugin: name := "Kafka Direct" version := "1.0" scalaVersion := "2.11.6" libraryDependencies ++= Seq ("org.apache.spark" %…

apache-spark apache-kafka spark-streaming

asked Mar 24 '17 at 05:04

Angshusuri

57
3
10

2

votes

2 answers

How can I write results of JavaPairDStream into output kafka topic on Spark Streaming?

I'm looking for a way to write a Dstream in an output kafka topic, only when the micro-batch RDDs spit out something. I'm using Spark Streaming and spark-streaming-kafka connector in Java8 (both latest versions) I cannot figure out. Thanks for the…

java apache-spark apache-kafka spark-streaming

asked Mar 21 '17 at 09:51

Aniello Guarino

197
2
10

2

votes

1 answer

Pyspark dataframe split json column values into top-level multiple columns

I have a json column which can contain any no of key:value pairs. I want to create new top level columns for these key:value pairs. For Eg if I have this data A B "{\"C\":\"c\" , \"D\":\"d\"...}" b This…

apache-spark pyspark spark-streaming apache-spark-sql

asked Mar 21 '17 at 01:14

gashu

863
2
10
21

2

votes

1 answer

Spark 2.1 Structured Streaming - Using Kakfa as source with Python (pyspark)

With Apache Spark version 2.1, I would like to use Kafka (0.10.0.2.5) as source for Structured Streaming with pyspark: kafka_app.py: from pyspark.sql import…

apache-spark pyspark apache-kafka spark-streaming

asked Mar 20 '17 at 14:22

JS G.

158
1
9

2

votes

1 answer

Spark Streaming Dynamic Allocation ExecutorAllocationManager

We have a spark 2.1 streaming application with a mapWithState, enabling spark.streaming.dynamicAllocation.enabled=true. The pipeline is as follows: var rdd_out = ssc.textFileStream() .map(convertToEvent(_)) .combineByKey(...., new…

apache-spark spark-streaming

asked Mar 13 '17 at 19:42

Joe Bledo

21
2

2

votes

1 answer

How to get group by Alias column in Dataframe SELECT list

I am doing SUM on multiple column, those columns want to include in the SELECT list. Below are my work: val df=df0 .join(df1, df1("Col1")<=>df0("Col1")) .filter((df1("Colum")==="00") …

apache-spark spark-streaming apache-spark-sql

asked Mar 13 '17 at 13:33

sks

169
4
15

2

votes

2 answers

Spark Streaming - Count distinct element in state

I am having a dstream with a key-value pair of VideoID-UserID, what is a good practice of count a distinct UserID group by VideoID? // VideoID,UserID foo,1 foo,2 bar,1 bar,2 foo,1 bar,2 As above, I want to get VideoID-CountUserID by removing…

python scala apache-spark spark-streaming apache-spark-sql

asked Mar 07 '17 at 10:03

shiberiu x

21
4

2

votes

1 answer

Spark Java: How to move data from HTTP source to Couchbase sink?

I've a .gz file available on a Web server that I want to consume in a streaming manner and insert the data into Couchbase. The .gz file has only one file in it, which in turn contains one JSON object per line. Since Spark doesn't have a HTTP…

apache-spark spark-streaming apache-spark-sql couchbase

asked Mar 04 '17 at 20:37

Abhijit Sarkar

21,927
20
110
219

2

votes

3 answers

Spark + Kafka streaming NoClassDefFoundError kafka/serializer/StringDecoder

I'm trying to send message from my kafka producer and stream it in spark streaming. But I'm getting the following error when I run my application on spark submit. Error Exception in thread "main" java.lang.NoClassDefFoundError:…

java maven apache-kafka spark-streaming spark-submit

asked Mar 02 '17 at 21:38

Gaurav Ram

1,085
3
16
32

2

votes

1 answer

Exception in thread "main" java.lang.NoClassDefFoundError: org/spark_project/guava/cache/CacheLoader

When i am trying to execute my kafka spark project. I am getting below error: Exception in thread "main" java.lang.NoClassDefFoundError: org/spark_project/guava/cache/CacheLoader at…

maven apache-spark apache-kafka spark-streaming pom.xml

asked Feb 21 '17 at 20:29

user3837415

41
3
15

2

votes

1 answer

Application hangs when I do join for PipelinedRDD and RDD from DStream

I use spark 1.6.0 with Spark Streaming and have one problem with wide operations. Code example: There is RDD called "a" which has type: class 'pyspark.rdd.PipelinedRDD'. "a" was received as: # Load a text file and convert each line to a Row. …

apache-spark pyspark spark-streaming apache-spark-sql flume-ng

asked Feb 17 '17 at 07:04

Anna Ivanova

61
6

Questions tagged [spark-streaming]