0

Are the following two the same?

val dstream = stream.window(Seconds(60), Seconds(1))
val x = dstream.map(x => ...)

and

val dstream = stream.window(Seconds(60), Seconds(1))
val x = dstream.transform(rdd => rdd.map(x => ...))
Jeffrey Chung
  • 19,319
  • 8
  • 34
  • 54
pythonic
  • 20,589
  • 43
  • 136
  • 219
  • Exact duplicate in my opinion, may be helpful :) – T. Gawęda Oct 05 '17 at 21:25
  • Just tell if the above two are same or not. The question you have linked, still does not make this thing clear. From what I understand, these two will give the same output. – pythonic Oct 06 '17 at 13:22

1 Answers1

1

map(func) Return a new DStream by passing each element of the source DStream through a function func.

and

transform(func) Return a new DStream by applying a RDD-to-RDD function to every RDD of the source DStream. This can be used to do arbitrary RDD operations on the DStream.

in short transform function in Spark streaming we can use for any of Apache Spark's transformations on the underlying RDDs for the stream. map is used for an element to element transform.

Essentially, map works on the elements of the DStream and transform allows you to work with the RDDs of the DStream(map works on each rows transform works on each rdd).

http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams

map Example

val clicks: DStream[...] = ...
val mappedClicks: ... = clicks.map(...)

transform Example

transform(transformFunc: RDD[T] => RDD[U]): DStream[U]
transform(transformFunc: (RDD[T], Time) => RDD[U]): DStream[U]
vaquar khan
  • 10,864
  • 5
  • 72
  • 96