1

Here is my code:

ssc =streamingcontext(sparkcontext,Seconds(time))
spark = sparksession.builder.config(properties).getorcreate()

val Dstream1: ReceiverInputDstream[Document] =  ssc.receiverStream(properties) // Dstream1 has Id1 and other fields

val Rdd2 = spark.sql("select Id1,key from hdfs.table").rdd // RDD[Row]

Is there a way I can join this two?

pheeleeppoo
  • 1,491
  • 6
  • 25
  • 29
Chethan
  • 15
  • 1
  • 5

1 Answers1

0

You'll first want to transform your Dstream & Rdd to use pairRDD's.

Something like this should do.

val DstreamTuple = Dstream1.map(x => (x. Id1, x))
val Rdd2Tuple = Rdd2.map(x => (x. Id1, x))

Once you do that, you can simply do a transformation on the dstream and join it on the RDD.

val joinedStream = DstreamTuple.transform(rdd =>
   rdd.leftOuterJoin(Rdd2Tuple)
)

Hope this helps :)