0

I am have some difficulty understanding some steps in a procedure. They take coordinate data, find the covariance matrix, apply PCA, then extract the standard deviation from the square root of each eigenvalue in short. I am trying to re-produce this process, but I am stuck on the steps.

The Steps Taken

The data set consists of one matrix, R, that contains coordiante paris, (x(i),y(i)) with i=1,...,N for N is the total number of instances recorded. We applied PCA to the covariance matrix of the R input data set, and the following variables were obtained:

a) the principal components of the new coordinate system, the eigenvectors u and v, and

b) the eigenvalues (λ1 and λ2) corresponding to the total variability explained by each principal component.

With these variables, a graphical representation was created for each item. Two orthogonal segments were centred on the mean of the coordinate data. The segments’ directions were driven by the eigenvectors of the PCA, and the length of each segment was defined as one standard deviation (σ1 and σ2) around the mean, which was calculated by extracting the square root of each eigenvalue, λ1 and λ2.

My Steps

#reproducable data
set.seed(1)
x<-rnorm(10,50,4)
y<-rnorm(10,50,7)
# Note my data is not perfectly distirbuted in this fashion
df<-data.frame(x,y) # this is my R matrix

covar.df<-cov(df,use="all.obs",method='pearson') # this is my covariance matrix
pca.results<-prcomp(covar.df) # this applies PCA to the covariance matrix
pca.results$sdev # these are the standard deviations of the principal components
                 # which is what I believe I am looking for. 

This is where I am stuck because I am not sure if I am trying to get the sdev output form prcomp() or if I should scale my data first. They are all on the same scale, so I do not see the issue with it.

My second question is how do I extract the standard deviation in the x and y direciton?

Jack Armstrong
  • 1,182
  • 4
  • 26
  • 59

2 Answers2

2

You don't apply prcomp to the covariance matrix, you do it on the data itself.

result= prcomp(df) 

If by scaling you mean normalize or standardize, that happens before you do prcomp(). For more information on the procedure see this link that is introductory to the procedure: pca on R. That can walk you through the basics. To get the sdev use the the summary on the result object

summary(result)
result$sdev
Diegolog
  • 308
  • 1
  • 7
  • Then what would the covariance matrix be used for if you apply prcomp to the data itself. Also, I understand when to normalize and standardize, but both x and y are on the same scale, hence I do not see the need to do it. Also, this does not answer the part about how to extract the sdev from prcomp(). – Jack Armstrong Apr 29 '19 at 14:20
  • 1
    I added some edits to clarify. The prcomp command computes the covariance matrix first and then applies the rotations. The scaling should be applied in the case where the variables are not scaled. The standard deviations are not those of x and y but of the components that have been computed. – Diegolog Apr 29 '19 at 14:33
  • The only reason to scale is to take into account variables that have different variances from different scales. X and Y are in the same scale. – Jack Armstrong Apr 29 '19 at 15:11
  • Also, that sdev, that is already square root of each eigenvalue right? – Jack Armstrong Apr 29 '19 at 15:16
  • Short answer yes they are the square root of the eigenvalues since the eigenvalues are the variance of the PC matrix. Long Answer: The standard deviations of the components are the singular values of the centered data matrix (that is why we scale), hence the square root of the eigenvalues of the transformation will be the singular value. – Diegolog Apr 29 '19 at 15:59
  • I see why you scale. The thing is that I need those unscaled values to find the area, similar to the procedure above because I am using it in another part of a project. If the data does get scaled, then it would distort the area calculation I believe. – Jack Armstrong Apr 29 '19 at 16:27
  • Sorry, that application is beyond the scope of the forum and my knowledge. Please mark the answer as correct if the answer was helpful – Diegolog Apr 29 '19 at 17:10
2

You don't apply prcomp to the covariance matrix. scale=T bases the PCA on the correlation matrix and F on the covariance matrix

df.cor = prcomp(df, scale=TRUE)
df.cov = prcomp(df, scale=FALSE)
Hamed Said
  • 41
  • 1
  • 3