Questions tagged [dstream]

Discretized Streams (D-Stream) is an approach that handles streaming computations as a series of deterministic batch computations on small time intervals.

Discretized Streams (D-Stream) is an approach that handles streaming computations as a series of deterministic batch computations on small time intervals. The input data received during each interval is stored reliably across the cluster to form an input dataset for that interval. Once the time interval completes, this dataset is processed via deterministic parallel operations, such as map, reduce and groupBy, to produce new datasets representing program outputs or intermediate state

109 questions

vote

1 answer

Perform Multiple Transformation on a DStream

I am fairly new Spark Streaming I have the streaming data containing two values x y. For example 1 300 2 8754 3 287 etc. Out of the streamed data, I want to get the smallest y value, largest y value, and the mean of the x values. This needs to be…

asked Nov 06 '16 at 22:42

Tsume

vote

3 answers

merge spark dStream with variable to saveToCassandra()

I have a DStream[String, Int] with pairs of word counts, e.g. ("hello" -> 10). I want to write these counts to cassandra with a step index. The index is initialized as var step = 1 and is incremented with each microbatch processed. The cassandra…

apache-spark cassandra spark-streaming spark-cassandra-connector dstream

asked Nov 02 '16 at 23:04

p3zo

vote

0 answers

Spark Streaming: From DStream to Pandas Dataframe

In the snippet below I try to transform a DStream of temperatures (received from Kafka) into a pandas Dataframe. def main_process(time, dStream): print("========= %s =========" % str(time)) try: # Get the singleton instance of SparkSession …

python pandas apache-spark dstream

asked Oct 23 '16 at 10:22

HappyCane

vote

0 answers

Spark JSON DStream Print() / saveAsTextFiles not working

Issue Description: Spark Version: 1.6.2 Execution: Spark-shell (REPL) master = local[2] (tried local[*]) example.json is as below: {"name":"D2" ,"lovesPandas":"Y"} {"name":"D3" ,"lovesPandas":"Y"} {"name":"D4" ,"lovesPandas":"Y"} {"name":"D5"…

scala printing spark-streaming dstream

asked Jul 30 '16 at 18:42

RGuy

vote

2 answers

Reading data from HBase through Spark Streaming

So my project flow is Kafka -> Spark Streaming ->HBase Now I want to read data again from HBase which will go over the table created by the previous job and do some aggregation and store it in another table in different column format Kafka -> Spark…

hbase spark-streaming dstream

asked Jul 25 '16 at 15:24

Dhruvrajsinh R Parmar

vote

1 answer

How to get the cartesian product of two DStream in Spark Streaming with Scala?

I have two DStreams. Let A:DStream[X] and B:DStream[Y]. I want to get the cartesian product of them, in other words, a new C:DStream[(X, Y)] containing all the pairs of X and Y values. I know there is a cartesian function for RDDs. I was only able…

scala spark-streaming cartesian-product dstream

asked Jul 18 '16 at 09:45

Coukaratcha

vote

1 answer

Distinct Element across dstream

I am working on window dstreams wherein each dstream contains 3 rdd with following keys: a,b,c b,c,d c,d,e d,e,f I want to get only unique keys across all dstream a,b,c,d,e,f How to do it in spark streaming?

apache-spark pyspark spark-streaming dstream

asked Jun 07 '16 at 17:16

vkb

vote

1 answer

How to Combine two Dstreams using Pyspark (similar to .zip on normal RDD)

I know that we can combine(like cbind in R) two RDDs as below in pyspark: rdd3 = rdd1.zip(rdd2) I want to perform the same for two Dstreams in pyspark. Is it possible or any alternatives? In fact, I am using a MLlib randomforest model to predict…

apache-spark zip streaming pyspark dstream

asked May 26 '16 at 16:27

Obaid

vote

2 answers

How to solve Type mismatch issue (expected: Double, actual: Unit)

Here is my function that calculates root mean squared error. However the last line cannot be compiled because of the error Type mismatch issue (expected: Double, actual: Unit). I tried many different ways to solve this issue, but still without…

scala apache-spark rdd dstream

asked May 02 '16 at 14:29

Klue

1,317
5
22
43

vote

1 answer

combineByKey on a Dstream throws an error

I have a dstream with tuples (String, Int) in it When I try combineByKey, it says me to specify parameter: Partitioner my_dstream.combineByKey( (v) => (v,1), (acc:(Int, Int), v) => (acc._1 + v, acc._2 + 1), (acc1:(Int, Int),…

scala spark-streaming rdd dstream

asked Apr 01 '16 at 14:53

Vadym B.

vote

0 answers

Parallel reduceByKeyAndWindow()s with different time values

I am working on Spark Streaming on a use case which demands 4 different outputs computed on different window lengths. In particular, I need my program to output the result of the computation every second based on 4 different time windows (windows…

java apache-spark reduce spark-streaming dstream

asked Jun 11 '15 at 10:04

luke

vote

3 answers

Spark Streaming not distributing task to nodes on cluster

I have two node standalone cluster for spark stream processing. below is my sample code which demonstrate process I am executing. sparkConf.setMaster("spark://rsplws224:7077") val ssc=new StreamingContext() println(ssc.sparkContext.master) val…

apache-spark spark-streaming rdd dstream

asked Jun 27 '14 at 06:39

Jigar Parekh

6,163
7
44
64

votes

0 answers

How to use a Dataframe, which is created from Dstream, outside of foreachRDD block?

i've been tried to working on spark streaming. My problem is I want to use wordCountsDataFrame again outside of the foreach block. i want to conditionally join wordCountsDataFrame and another dataframe that is created from Dstream. Is there any…

dataframe scala spark-streaming dstream

asked Feb 17 '23 at 10:24

livtzcn

votes

2 answers

How to calculate average by category in pyspark streaming?

I have csv data coming as DStreams from traffic counters. Sample is as…

python pyspark spark-streaming rdd dstream

asked Dec 06 '22 at 10:55

Muhammad Allain

votes

1 answer

Read Avro records from Kafka using Spark Dstreams

I'm using spark 2.3 and trying to stream data from Kafka using Dstreams (using DStreams to acheive a specific usecase which we were not able to using Structured Streaming). The Kafka topic contains data in avro format. I want the read that data…

scala apache-spark apache-kafka spark-streaming dstream

asked Nov 23 '22 at 11:34

BHC

Prev 1 2 3

5 6 7 8 Next