1

I'm trying to figure out if something like http://www.cs.utexas.edu/ftp/techreports/tr95-13.pdf is possible on Spark.

Is it possible to access low level RDD functionality/distribution in the same kind of way as with MPI (Key concept for SUMMA is 2D process topology and row/col broadcasts.)

I've seen simple matrix multiplication in Spark , but this doesn't seem to come close to SUMMA's efficiency.

Thanks!

Community
  • 1
  • 1

1 Answers1

2

Me and my schoolmates have accomplished a distributed matrix library on top of Spark: Marlin(https://github.com/PasaLab/marlin). The algorithm of Matrix multiplication implemented in our library refer to CARMA(http://www.eecs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf).

At first, we survey the SUMMA algorithm. However, sending submatrix along the processors row and column during each iteration is quite difficult to implement with Spark's API. Recently, we have implemented a mechanism simliar to MPI send and receive in Spark by using TorrentBroadcast, which needs to modify the Spark core code. I think with this strategy, It's possible to implement SUMMA in Spark. But the fault tolerance and scalability may be a problem.

Yun Tang
  • 31
  • 4