1

I am trying to inverse a spark rowmatrix. The function I am using is below.

def computeInverse(matrix: RowMatrix): BlockMatrix = {

val numCoefficients = matrix.numCols.toInt
val svd = matrix.computeSVD(numCoefficients, computeU = true)

val indexed_U = new IndexedRowMatrix(svd.U.rows.zipWithIndex.map(r => new IndexedRow(r._2, r._1)))
val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => if(x == 0) 0 else math.pow(x,-1))))
val V_inv = svd.V.multiply(invS)

val inverse = indexed_U.multiply(V_inv.transpose)
inverse.toBlockMatrix.transpose

}

The logic I am implementing is through SVD. An explanation of the process is

U, Σ, V = svd(A)
A = U * Σ * V.transpose
A.inverse = (U * Σ * V.transpose).inverse
         = (V.transpose).inverse * Σ.inverse * U.inverse

Now U and V are orthogonal matrix
Therefore,
         M * M.transpose = 1
Applying the above,

A.inverse = V * Σ.inverse * U.transpose

Let V * Σ.inverse be X

A.inverse = X * U.transpose

Now, A * B = ((A * B).transpose).transpose
           = (B.transpose * A.transpose).transpose

Applying the same, to keep U as a row matrix, not a local matrix 
A.inverse = X * U.transpose
          = (U.transpose.transpose * X.transpose).transpose
          = (U * X.transpose).transpose

The problem is with the input row matrix. For example

1, 2, 3
4, 5, 6
7, 8, 9
10,11,12

the inverse from the above code snippet and on using python numpy is different. I am unable to find out why is it so? Is it because of some underlying assumption made during svd calculation? Any help will be greatly appreciated. Thanks.

Debasish
  • 113
  • 1
  • 9
  • Have you checked [this](http://stackoverflow.com/questions/29969521/how-to-compute-the-inverse-of-a-rowmatrix-in-apache-spark) out? – evan.oman Jun 22 '16 at 15:18
  • Yes, but that algo too doesn't match the results I get using numpy inverse in python. – Debasish Jun 22 '16 at 15:44

1 Answers1

0

The above code works properly. The reason I was getting this error was that I made the RowMatrix with a RDD[Vector]. Now, in spark things are sorted column wise to form a matrix, whereas in the case of numpy, array is converted row wise to a matrix

Array(1,2,3,4,5,6,7,8,9)

In Spark

1 4 7
2 5 8
3 6 9

In python, it is interpreted as

1 2 3
4 5 6
7 8 9

So, the test case was failing :|

Debasish
  • 113
  • 1
  • 9