Distributed matrix multiplication in Scala

Question

How to multiply two distributed mllib matrices and get the result back in Scala in a standalone spark cluster with 9 worker machine and 1 driver machine? There are 27 workers i.e. 3 workers per worker machine with two cores each. The multiplication is to be done with the corresponding partitions i.e. 1st partition of Matrix A with 1st partition of Matrix B and so on. I am planning for 27 partitions.

The result of product of matrices should be received partition-wise. And also, how to maintain equal number of records in each partition? The Matrix A is smaller one but Matrix B is a larger one which can't fit into memory of a single machine. The target is to implement further transformations to partition-wise product of Mat A and Mat B.

Let me clear it with the following code. The following code creates two block matrices.

//creation of blocks as local matrices which are components of first block matrix 
    val eye1 = Matrices.dense(3, 2, Array(1, 2, 3, 4, 5, 6))
    val eye2 = Matrices.dense(3, 2, Array(4, 5, 6, 7, 8, 9))
    val eye3 = Matrices.dense(3, 2, Array(7, 8, 9, 1, 2, 3))
    val eye4 = Matrices.dense(3, 2, Array(4, 5, 6, 1, 2, 3))

    val blocks = sc.parallelize(Seq(
       ((0, 0), eye1), ((1, 1), eye2), ((2, 2), eye3), ((3, 3), eye4)),4)
    //block matrix created with 3 rows per block and 2 columns per block
    val blockMatrix = new BlockMatrix(blocks, 3, 2)

//creation of blocks as local matrices which are components of second block matrix
    val eye5 = Matrices.dense(2, 4, Array(1, 2, 3, 4, 5, 6, 7, 8))
    val eye6 = Matrices.dense(2, 4, Array(2, 4, 6, 8, 10, 12, 14, 16))
    val eye7 = Matrices.dense(2, 4, Array(3, 6, 9, 12, 15, 18, 21, 24))
    val eye8 = Matrices.dense(2, 4, Array(4, 8, 12, 16, 20, 24, 28, 32))

    val blocks1 = sc.parallelize(Seq(
       ((0, 0), eye5), ((1, 1), eye6), ((2, 2), eye7), ((3,3), eye8)),4)
    //block matrix created with 2 rows per block and 4 columns per block
    val blockMatrix1 = new BlockMatrix(blocks1, 2, 4)

    //The following line multiplies the block matrices
    val blockProduct = blockMatrix.multiply(blockMatrix1)
    //the indices of block matrix are converted to RDD
    var blockMatrixIndex = blockProduct.blocks.map{
      case((a,b),m) => (a,b)}
    var (blockRowIndexMaxValue, blockColIndexMaxValue) = blockMatrixIndex.max()
    //the data of block of blockmatrix is converted to RDD
    var blockMatrixRDD = blockProduct.blocks.map{
      case((a,b),m) => m}
    //elements of block matrix are doubled
    var blockMatrixRDDElementDoubled = blockMatrixRDD.map(x => x.toArray.map(y => 2*y))
    //code for finding number of rows of individual block in the block matrix
    var blockMatRowCount = blockMatrixRDD.map(x => x.numRows).first
    //code for finding number of columns of individual block in the block matrix
    var blockMatColCount = blockMatrixRDD.map(x => x.numCols).first
    //data block of block matrix is recreated
    var blockMatrixBlockRecreated = blockMatrixRDDElementDoubled.map(x => Matrices.dense(blockMatRowCount, blockMatColCount, x))
    //code for generating index sequence for blocks of blockmatrix
    val indexRange = List.range(0, blockRowIndexMaxValue + 1)
    var indexSeq = indexRange zip indexRange
    //partitioning index sequence into 4 partitions
    var indexSeqRDD = sc.parallelize(indexSeq, blockRowIndexMaxValue + 1)
    //code for regenerating block matrix in RDD form
    var completeBlockMatrixRecreated = indexSeqRDD.zip(blockMatrixBlockRecreated)

The completeBlockMatrixRecreated is of type org.apache.spark.rdd.RDD[((Int, Int), org.apache.spark.mllib.linalg.Matrix)]. So it should contain 4 blocks.

If I am trying to execute

completeBlockMatrixRecreated.take(2)

It is showing the error "org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition"

Although this answer is in Java, I think it answers your questions:http://stackoverflow.com/questions/33558755/matrix-multiplication-in-apache-spark — The Archetypal Paul, Jul 17 '16 at 21:17
Also this: https://www.bigdatapartnership.com/2015/11/23/scalable-matrix-multiplication-using-spark-2/ and a lot of other google hits for "spark matrix multiplication" — The Archetypal Paul, Jul 17 '16 at 21:18
The problem involves a ‘short & fat’ matrix and a ‘tall & thin’ matrix in the above link. But my problem is multiplication of ‘tall & thin’ and ‘short & thin’ resulting into a 'tall & thin' matrix. — Sk Kamaruddin, Jul 18 '16 at 10:13

Distributed matrix multiplication in Scala

0 Answers0