I have an example word by document matrix (from Landauer and Dumais, 1997):
wxd <- matrix(c(1,1,1,0,0,0,0,0,0,0,0,0,
0,0,1,1,1,1,1,0,1,0,0,0,
0,1,0,1,1,0,0,1,0,0,0,0,
1,0,0,0,2,0,0,1,0,0,0,0,
0,0,0,1,0,1,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,1,0,0,
0,0,0,0,0,0,0,0,0,1,1,0,
0,0,0,0,0,0,0,0,0,1,1,1,
0,0,0,0,0,0,0,0,1,0,1,1)
,12, 9)
rownames(wxd) <- c("human", "interface", "computer", "user", "system",
"response", "time", "EPS", "survey", "trees", "graph", "minors")
colnames(wxd) <- c(paste0("c", 1:5), paste0("m", 1:4))
I can perform Singular Value Decomposition on this matrix using the svd()
function and have three matrices U
, S
, and V
:
SVD <- svd(wxd)
U <- SVD$u
S <- diag(SVD$d)
V <- SVD$v
I can multiply these matrices and get my original matrix returned (within some small margin or error):
U %*% S %*% t(V)
I can also take the first two columns of the U
and V
matrices and the first two columns and rows of the S
matrix to get the least squares best approximation of the original data. This fits with the results of the same procedure in the paper I mentioned above:
U[ , 1:2] %*% S[1:2, 1:2] %*% t(V[ , 1:2])
I am wanting to make sure I understand what this function is doing (as best as I am able), and I have been able to generate the V
and S
matrices to match those from the svd()
function:
ATA <- t(wxd) %*% wxd
V2 <- eigen(ATA)$vectors
S2 <- sqrt(diag(eigen(ATA)$values))
But, the U
matrix I generate has the same absolute values for the first 9 columns then adds an additional 3 columns. And some elements of this U
matrix have different signs than the U
matrix from the svd()
function:
AAT <- wxd %*% t(wxd)
U2 <- eigen(AAT)$vectors
So my question is, why is the U
matrix different than when I attempt to calculate it from scratch?