0

I want to do something like that:

val myBigRdd2: RDD[_] = ???
myBigRdd1.mapPartition{ dataBlock => 
    // operation involving dataBlock and an other RDD
    // like myBigRdd2.multiply(dataBlock) 
    // if myBigRdd2 is a matrix. Or something similar.
}

is there a way of giving an RDD to the executor ?

I think Broadcast on rdd2 won't work because it is too big.

And doing collect and grouped on the rdd1 won't work either because the driver memory will blow up.

Is there any other way ?

cartesian work but takes forever.

Wonay
  • 1,160
  • 13
  • 35
  • 1
    No, there is not. But possibly related [Matrix Multiplication in Apache Spark](https://stackoverflow.com/q/33558755/9613318) – Alper t. Turker May 17 '18 at 21:41
  • I kind of found a way using `RandomSplit`, that give a `Seq[RDD]` and then I can collect the smaller one to create a new RDD , sequentially not to overload the driver and then `union` those. – Wonay May 18 '18 at 00:52

1 Answers1

0

You cannot pass RDD to mapPartition as that is a structure only known to the driver. Executors cannot use the RDD structure.

Anil
  • 84
  • 6