I have a X, distributed matrix, in RowMatrix form. I am using Spark 1.3.0. I need to be able to calculate X inverse.
-
One algorithm is described in https://arxiv.org/pdf/1801.04723.pdf – Andrew Sep 28 '19 at 02:07
3 Answers
import org.apache.spark.mllib.linalg.{Vectors,Vector,Matrix,SingularValueDecomposition,DenseMatrix,DenseVector}
import org.apache.spark.mllib.linalg.distributed.RowMatrix
def computeInverse(X: RowMatrix): DenseMatrix = {
val nCoef = X.numCols.toInt
val svd = X.computeSVD(nCoef, computeU = true)
if (svd.s.size < nCoef) {
sys.error(s"RowMatrix.computeInverse called on singular matrix.")
}
// Create the inv diagonal matrix from S
val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => math.pow(x,-1))))
// U cannot be a RowMatrix
val U = new DenseMatrix(svd.U.numRows().toInt,svd.U.numCols().toInt,svd.U.rows.collect.flatMap(x => x.toArray))
// If you could make V distributed, then this may be better. However its alreadly local...so maybe this is fine.
val V = svd.V
// inv(X) = V*inv(S)*transpose(U) --- the U is already transposed.
(V.multiply(invS)).multiply(U)
}

- 6,004
- 3
- 39
- 53
I had problems using this function with option
conf.set("spark.sql.shuffle.partitions", "12")
The rows in RowMatrix got shuffled.
Here is an update that worked for me
import org.apache.spark.mllib.linalg.{DenseMatrix,DenseVector}
import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
def computeInverse(X: IndexedRowMatrix)
: DenseMatrix =
{
val nCoef = X.numCols.toInt
val svd = X.computeSVD(nCoef, computeU = true)
if (svd.s.size < nCoef) {
sys.error(s"IndexedRowMatrix.computeInverse called on singular matrix.")
}
// Create the inv diagonal matrix from S
val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => math.pow(x, -1))))
// U cannot be a RowMatrix
val U = svd.U.toBlockMatrix().toLocalMatrix().multiply(DenseMatrix.eye(svd.U.numRows().toInt)).transpose
val V = svd.V
(V.multiply(invS)).multiply(U)
}

- 5,721
- 4
- 31
- 50

- 31
- 3
Matrix U returned by X.computeSVD
has dimensions m x k where m is the number of rows of the original (distributed) RowMatrix X. One would expect m to be large (possibly larger than k), so it is not advisable to collect it in the driver if we want our code to scale to really large values of m.
I would say both of the solutions below suffer from this flaw. The answer given by @Alexander Kharlamov
calls val U = svd.U.toBlockMatrix().toLocalMatrix()
which collects the matrix in the driver. The same happens with the answer given by @Climbs_lika_Spyder
(btw your nick rocks!!), which calls svd.U.rows.collect.flatMap(x => x.toArray)
. I would rather suggest to rely on a distributed matrix multiplication such as the Scala code posted here.

- 328
- 2
- 16
-
I do not see any inverse calculations at the link you added. – Climbs_lika_Spyder Nov 04 '16 at 21:55
-
@Climbs_lika_Spyder The link is about distributed matrix multiplication to replace the local matrix multiplication `(V.multiply(invS)).multiply(U)` in the last line of your solution, so that you do not need to collect `U` in the driver. I think `V` and `invS` are not big enough to cause problems. – Pablo Nov 07 '16 at 08:18