I have a streaming job consuming from Kafka (using createDstream
).
its stream of "id"
[id1,id2,id3 ..]
I have an utility or an api which accepts an Array of id's and does some external call and receives back some info say "t" for each id
[id:t1,id2:t2,id3:t3...]
I want to retain the DStream
while calling the utility to retain Dstream. I can't use map transformation on Dstream rdd as it will make a call for each id, and moreover the utility is accepting a collection of id's.
Dstream.map(x=> myutility(x)) -- ruled out
And if I use
Dstream.foreachrdd(rdd=> myutility(rdd.collect.toarray))
I lose the DStream
. I need to retain DStream
for downstream processing.