R- reduce dimensionality LSA

Question

I am following an example of svd, but I still don't know how to reduce the dimension of the final matrix:

a <- round(runif(10)*100)
dat <- as.matrix(iris[a,-5])
rownames(dat) <- c(1:10)

s <- svd(dat)

pc.use <- 1
recon <- s$u[,pc.use] %*% diag(s$d[pc.use], length(pc.use), length(pc.use)) %*% t(s$v[,pc.use])

But recon still have the same dimension. I need to use this for Semantic analysis.

score 1 · Accepted Answer · answered Jul 12 '15 at 18:45

The code you provided does not reduce the dimensionality. Instead it takes first principal component from your data, removes the rest of principal components, and then reconstructs the data with only one PC.

You can check that this is happening by inspecting the rank of the final matrix:

library(Matrix)
rankMatrix(dat)
as.numeric(rankMatrix(dat))
[1] 4
as.numeric(rankMatrix(recon))
[1] 1

If you want to reduce dimensionality (number of rows) - you can select some principal principal components and compute the scores of your data on those components instead.

But first let's make some things clear about your data - it seems you have 10 samples (rows) with 4 features (columns). Dimensionality reduction will reduce the 4 features to a smaller set of features.

So you can start by transposing your matrix for svd():

dat <- t(dat)
dat
               1   2   3   4   5   6   7   8   9  10
Sepal.Length 6.7 6.1 5.8 5.1 6.1 5.1 4.8 5.2 6.1 5.7
Sepal.Width  3.1 2.8 4.0 3.8 3.0 3.7 3.0 4.1 2.8 3.8
Petal.Length 4.4 4.0 1.2 1.5 4.6 1.5 1.4 1.5 4.7 1.7
Petal.Width  1.4 1.3 0.2 0.3 1.4 0.4 0.1 0.1 1.2 0.3

Now you can repeat the svd. Centering the data before this procedure is advisable:

s <- svd(dat - rowMeans(dat))

Principal components can be obtained by projecting your data onto PCs.

PCs <- t(s$u) %*% dat

Now if you want to reduce dimensionality by eliminating PCs with low variance you can do so like this:

dat2 <- PCs[1:2,] # would select first two PCs.

but what about the original class? I need to keep a matrix with the original classes to classify the data set using for example knn. — GabyLP, Jul 13 '15 at 18:54
Hm what classes? I didn't really see any class labels in your posted code. But let's go ahead - in the data there are 10 samples and 4 features. After PCA and and selecting first 2 PCs you will have 2 features and same 10 samples. So all the samples are the same and their order is the same. You can use the same class labels if you have them. Just be aware that in my code samples are in the columns and features in the rows. — Karolis Koncevičius, Jul 13 '15 at 20:59

R- reduce dimensionality LSA

1 Answers1