2

is there a way to execute multiple actions async/parallel in spark streaming? Here is my code:

positions.foreachRDD(rdd -> {

           JavaRDD<A> pbv = rdd.map(p -> A.create(p));          
           javaFunctions(pbv).writerBuilder("poc", "table_a", mapToRow(A.class)).saveToCassandra();     

           JavaRDD<D> pbd = rdd.map(p -> D.create(p));          
           javaFunctions(pbd).writerBuilder("poc", "table_d", mapToRow(D.class)).saveToCassandra();             

           JavaRDD<L> pblv = rdd.map(p -> L.create(p));         
           javaFunctions(pblv).writerBuilder("poc", "table_l", mapToRow(L.class)).saveToCassandra();            

           JavaRDD<V> pbld = rdd.map(p -> V.create(p));         
           javaFunctions(pbld).writerBuilder("poc", "table_v", mapToRow(V.class)).saveToCassandra();

    });

I would like to do the saveToCassandra actions in parallel, is this possible with "spark technics" or only via selfmade Thread/Executer handling?

Thanks for your help!

Regards, Markus

mananana
  • 393
  • 3
  • 15
  • 1
    Not exactly an answer to your question, but the fastest way to execute these inserts is to do `rdd.foreachPartition`, create `PreparedStatements` using the java driver and submit them with `session.executeAsync` – maasg Jun 16 '16 at 13:53
  • Thx for the hint. Will try to test your approach. if the dstream approach from @RussS didn't work. – mananana Jun 17 '16 at 06:53

1 Answers1

1

@massg's comment is probably the fastest approach to this kind of forking but if you don't want that you could more easily do this by just splitting the stream.

Something like

positions.map(A.create(p)).saveToCassandra
positions.map(D.create(p)).saveToCassandra
positions.map(L.create(p)).saveToCassandra
positions.map(V.create(p)).saveToCassandra

By acting on the DSTream all these requests will happen in parallel.

RussS
  • 16,476
  • 1
  • 34
  • 62