I have a Spark streaming process which reads data from kafka, into a DStream.
In my pipeline I do two times (one after another):
DStream.foreachRDD( transformations on RDD and inserting into destination).
(each time I do different processing and insert data to different destination).
I was wondering how would DStream.cache, right after I read data from Kafka work? Is it possible to do it?
Is the process now actually reading data two times from Kafka?
Please keep in mind, that it is not possible to put two foreachRDDs into one (because two paths are quite different, there are statefull transformations there - which need to be appliend on DStream...)
Thanks for your help