1

=========update==========

I read an infomation in this book:

The matrix that is actually returned by TruncatedSVD is the dot product of the U andS matrices.

Then i try to just multiply U and Sigma:

US = U.dot(Sigma)
print("==>> US: ", US)

this time it produce the same result, just with sign flipping. So why Truncated SVD doesn't need multiplying VT ?

==========previous question===========

I am learning SVD, i found numpy and sklearn both provide some related APIs, then i try to use them to do dimensional reduction, below are the code:

import numpy as np
np.set_printoptions(precision=2, suppress=True)
A = np.array([
    [1,1,1,0,0],
    [3,3,3,0,0],
    [4,4,4,0,0],
    [5,5,5,0,0],
    [0,2,0,4,4],
    [0,0,0,5,5],
    [0,1,0,2,2]])
U, s, VT = np.linalg.svd(A)
print("==>> U: ", U)
print("==>> VT: ", VT)

# create m x n Sigma matrix
Sigma = np.zeros((A.shape[0], A.shape[1]))
# populate Sigma with n x n diagonal matrix
square_len = min((A.shape[0], A.shape[1]))
Sigma[:square_len, :square_len] = np.diag(s)

print("==>> Sigma: ", Sigma)

n_elements = 2
U = U[:, :n_elements]
Sigma = Sigma[:n_elements, :n_elements]
VT = VT[:n_elements, :n_elements]

# reconstruct
B = U.dot(Sigma.dot(VT))
print("==>> B: ", B)

The output B is :

==>> B:  [[ 0.99  1.01]
 [ 2.98  3.04]
 [ 3.98  4.05]
 [ 4.97  5.06]
 [ 0.36  1.29]
 [-0.37  0.73]
 [ 0.18  0.65]]

then this is sklearn code:

import numpy as np
from sklearn.decomposition import TruncatedSVD

A = np.array([
    [1,1,1,0,0],
    [3,3,3,0,0],
    [4,4,4,0,0],
    [5,5,5,0,0],
    [0,2,0,4,4],
    [0,0,0,5,5],
    [0,1,0,2,2]]).astype(float)
svd = TruncatedSVD(n_components=2)
svd.fit(A)  # Fit model on training data A
print("==>> right singular vectors: ", svd.components_)
print("==>> svd.singular_values_: ", svd.singular_values_)
B = svd.transform(A)  # Perform dimensionality reduction on A.
print("==>> B: ", B)

its last output result is:

==>> B:  [[ 1.72 -0.22]
 [ 5.15 -0.67]
 [ 6.87 -0.9 ]
 [ 8.59 -1.12]
 [ 1.91  5.62]
 [ 0.9   6.95]
 [ 0.95  2.81]]

As we can see, they produce different result (but i notice their singular values are the same, both are 12.48 9.51), how to make them same, does i misunderstand something ?

Wade Wang
  • 536
  • 6
  • 11

1 Answers1

0

I think the correct way to perform a dimensionality reduction of the array A with np.linalg.svd is:

U, s, V = np.linalg.svd(A)
VT = V.T
B = A@VT[:,:n_elements]

Now B is:

array([[-1.72,  0.22],
       [-5.15,  0.67],
       [-6.87,  0.9 ],
       [-8.59,  1.12],
       [-1.91, -5.62],
       [-0.9 , -6.95],
       [-0.95, -2.81]])

That is exactly what you get from the TruncatedSVD, but with negative sign.

  • Because A = U*S*V and V is orthoganal, therefore A*V^T = U*S, so this is actually the same thing with "The matrix that is actually returned by TruncatedSVD is the dot product of the U andS matrices". I don't quite understand U*S and U*S*V which one is more comon or useful in real use case – Wade Wang May 18 '22 at 09:15