Questions tagged [dstream]

Discretized Streams (D-Stream) is an approach that handles streaming computations as a series of deterministic batch computations on small time intervals.

Discretized Streams (D-Stream) is an approach that handles streaming computations as a series of deterministic batch computations on small time intervals. The input data received during each interval is stored reliably across the cluster to form an input dataset for that interval. Once the time interval completes, this dataset is processed via deterministic parallel operations, such as map, reduce and groupBy, to produce new datasets representing program outputs or intermediate state

109 questions
0
votes
1 answer

mapValues function in DStream class Not Found

I want to do some modifications on the StreamingKMeans algorithm provided in Spark Streaming, so I created a project containing the necessary files but unfortunately I can not find the mapValues function in the DStream class ! def predictOnValues[K:…
Momog
  • 567
  • 7
  • 27
0
votes
1 answer

Cartesian of DStream

I use Spark cartesian function to to generate a list N pairs of values. I then map over these values to generate a distance metric between each of the users : val cartesianUsers: org.apache.spark.rdd.RDD[(distance.classes.User,…
blue-sky
  • 51,962
  • 152
  • 427
  • 752
-1
votes
1 answer

Scala: Splitting the data coming from kafka vi a DStream

I am receiving the data from kafka in the form of {"email":"test@example","firstname":"Example","lastname":"User"} I want to access the email id and first name and want to compare it with data coming from cassandra in the form of :…
Anonymous
  • 29
  • 6
-1
votes
1 answer

Flatten joined DStream

I've joined some DStream's together, so that the current "datatype" of the DStream looks like this ( key and values): DStream[(Long,((DateTime,Int),((Int,Double),Double)))] But i want to get: DStream[(Long,DateTime,Int,Int,Double,Double)] or…
1 2 3 4 5 6 7
8