I want to do something like that:
val myBigRdd2: RDD[_] = ???
myBigRdd1.mapPartition{ dataBlock =>
// operation involving dataBlock and an other RDD
// like myBigRdd2.multiply(dataBlock)
// if myBigRdd2 is a matrix. Or something similar.
}
is there a way of giving an RDD to the executor ?
I think Broadcast
on rdd2
won't work because it is too big.
And doing collect
and grouped
on the rdd1
won't work either because the driver memory will blow up.
Is there any other way ?
cartesian
work but takes forever.