Are the following two the same?
val dstream = stream.window(Seconds(60), Seconds(1))
val x = dstream.map(x => ...)
and
val dstream = stream.window(Seconds(60), Seconds(1))
val x = dstream.transform(rdd => rdd.map(x => ...))
Are the following two the same?
val dstream = stream.window(Seconds(60), Seconds(1))
val x = dstream.map(x => ...)
and
val dstream = stream.window(Seconds(60), Seconds(1))
val x = dstream.transform(rdd => rdd.map(x => ...))
map(func) Return a new DStream by passing each element of the source DStream through a function func.
and
transform(func) Return a new DStream by applying a RDD-to-RDD function to every RDD of the source DStream. This can be used to do arbitrary RDD operations on the DStream.
in short transform function in Spark streaming we can use for any of Apache Spark's transformations on the underlying RDDs for the stream. map is used for an element to element transform.
Essentially, map works on the elements of the DStream and transform allows you to work with the RDDs of the DStream(map works on each rows transform works on each rdd).
http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams
map Example
val clicks: DStream[...] = ...
val mappedClicks: ... = clicks.map(...)
transform Example
transform(transformFunc: RDD[T] => RDD[U]): DStream[U]
transform(transformFunc: (RDD[T], Time) => RDD[U]): DStream[U]