1

I have two DStreams. Let A:DStream[X] and B:DStream[Y].

I want to get the cartesian product of them, in other words, a new C:DStream[(X, Y)] containing all the pairs of X and Y values.

I know there is a cartesian function for RDDs. I was only able to find this similar question but it's in Java and so does not answer my question.

Community
  • 1
  • 1
Coukaratcha
  • 133
  • 2
  • 11
  • Of course. `A:DStream[(String, Int)]` is a collection of terms with a computed value, associated to each of them. `B:DStream[Int]` is the result of `count` function, so it contains only one integer value. I want to compute something using the integer value from A and the integer value from B. By getting the cartesian product of A and B, I will obtain a new DStream with the value from B added to each record of A, and I will be able to compute my result with a map. The order does not matter. I am a very new user of Spark and Scala, so feel free to advise me any better way if I am wrong. – Coukaratcha Jul 18 '16 at 11:08

1 Answers1

1

The Scala equivalent of the linked question's answer (ignoring Time v3, which isn't used there) is

A.transformWith(B, (rddA: RDD[X], rddB: RDD[Y]) => rddA.cartesian(rddB))

or shorter

A.transformWith(B, (_: RDD[X]).cartesian(_: RDD[Y]))
Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487