-1

I'm working on SVD using pyspark. But in the documentation as well as any other place I didn't find how to reconstruct the matrix back using the segemented vectors.For example, using the svd of pyspark, I got U, s and V matrix as below.

from pyspark.mllib.linalg import Vectors
from pyspark.mllib.linalg.distributed import RowMatrix
rows = sc.parallelize([
    Vectors.sparse(5, {1: 1.0, 3: 7.0}),
    Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
    Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
])

mat = RowMatrix(rows)

# Compute the top 5 singular values and corresponding singular vectors.
svd = mat.computeSVD(5, computeU=True)
U = svd.U       # The U factor is a RowMatrix.
s = svd.s       # The singular values are stored in a local dense vector.
V = svd.V       # The V factor is a local dense matrix.

Now, I want to reconstruct back the original matrix by multiplying it back. The equation is:

mat_cal = U.diag(s).V.T

In python, we can easily do it. But in pyspark I'm not getting the result. I found this link. But it's in scala and I don't know the how to convert it in pyspark. If someone can guide me, it will be very helpful.

Thanks!

sharp
  • 29
  • 7

1 Answers1

2

Convert u to diagonal matrix Σ:

import numpy as np
from pyspark.mllib.linalg import DenseMatrix

Σ = DenseMatrix(len(s), len(s), np.diag(s).ravel("F"))

Transpose V, convert to column major and then convert back to DenseMatrix

V_ = DenseMatrix(V.numCols, V.numRows, V.toArray().transpose().ravel("F"))

Multiply:

mat_ = U.multiply(Σ).multiply(V_) 

Inspect the results:

for row in mat_.rows.take(3): 
    print(row.round(12)) 
[0. 1. 0. 7. 0.]
[2. 0. 3. 4. 5.]
[4. 0. 0. 6. 7.]

Check the norm

np.linalg.norm(np.array(rows.collect()) - np.array(mat_.rows.collect())
1.2222842061189339e-14

Of course the last two steps are used only for testing, and won't be feasible on real life data.

user10938362
  • 3,991
  • 2
  • 12
  • 29
  • I am surprised this works as U is a `RowMatrix` then `mat_ = U.multiply(Σ).multiply(V_) ` will return the following error: `'RowMatrix' object has no attribute 'collect'` – laila Apr 19 '22 at 13:38