I'm working on SVD using pyspark. But in the documentation as well as any other place I didn't find how to reconstruct the matrix back using the segemented vectors.For example, using the svd of pyspark, I got U
, s
and V
matrix as below.
from pyspark.mllib.linalg import Vectors
from pyspark.mllib.linalg.distributed import RowMatrix
rows = sc.parallelize([
Vectors.sparse(5, {1: 1.0, 3: 7.0}),
Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
])
mat = RowMatrix(rows)
# Compute the top 5 singular values and corresponding singular vectors.
svd = mat.computeSVD(5, computeU=True)
U = svd.U # The U factor is a RowMatrix.
s = svd.s # The singular values are stored in a local dense vector.
V = svd.V # The V factor is a local dense matrix.
Now, I want to reconstruct back the original matrix by multiplying it back. The equation is:
mat_cal = U.diag(s).V.T
In python, we can easily do it. But in pyspark I'm not getting the result. I found this link. But it's in scala and I don't know the how to convert it in pyspark. If someone can guide me, it will be very helpful.
Thanks!