How many RDD in the resulting DStream of reduceByKeyAndWindow

Asked Jun 19 '15 at 15:48

Active May 12 '16 at 09:47

Viewed 334 times

I am currently working on a small spark job to compute stock correlation matrix from a DStream.

From a DStream[(time, quote)], I need to aggregate quotes (double) by time (Long) among multiple rdds, before computing correlations (considering all quotes of the rdds)

dstream.reduceByKeyAndWindow{./*aggregate quotes in Vectors*/..} 
       .forEachRDD {rdd => Statistics.corr(RDD[Vector])}

To my mind, this could be a solution if the resulting dstream (from reduceByKeyAndWindow) contains only 1 rdd with all aggregated quotes.

But I am not sure. How is the data distributed after reduceByKeyAndWindow? Is there a way to merge rdds in a dstream?

edited Jun 25 '15 at 10:43

Bart

19,692
7
68
77

asked Jun 19 '15 at 15:48

Michael Benguigui

Please, Help.. (Leeloo, 1997) – Michael Benguigui Jun 23 '15 at 13:09

How many RDD in the resulting DStream of reduceByKeyAndWindow

0 Answers0