0

I use prcomp to run PCA in r. When I output the summary, i.e. standard deviation, proportion of variance, cumulative proportion, the results are always ordered and the actual column name is replaced by PC1, PC2. Thus, I cannot tell the exact proportion of variance for each column.

Can anyone show me or give me some hint on how to display the column when outputting summary results. Two results pics are attached here:

enter image description here

enter image description here

Dale K
  • 25,246
  • 15
  • 42
  • 71
Harry
  • 331
  • 1
  • 4
  • 14
  • Thank you d.b. You are right and I think I need to revise my question. I need to plot some figures to determine how each dimension contributes to PC1 and PC2. Another question is that in this case, the two PCs only contribute 38%, how do you think I need to select the right dimensions to represent the problem. Thanks – Harry Oct 14 '19 at 18:04

1 Answers1

0

It is not clear that you understand what principal components does. It reduces the dimensionality of the data. Assuming the rows are observations and the columns are variables, imagine plotting your rows in 35 dimensions (the columns). Most people have trouble visualizing more than 3 dimensions. Principal components creates a smaller set of axes that explains most the the variation in the data. The axes are Euclidian meaning they are at right angles to one another. Your plot and the result of the summary(res.pca5) and plot(res.pca5) functions show that the first dimension explains 28% of the variation in the 35 variables. Adding a second dimension gives you almost 38% and three gives you 44%. These new variables are combinations of your original variables, not the original variables. The first two components explain more of the variability than any other combination.

For some reason you did not try res.pca5 as a command (or the equivalent print(res.pca5)) which would show you the coefficients that pca used to create the components from the original variables or biplot(res.pca5) which plots the rows and columns in the new two dimensional space.

dcarlson
  • 10,936
  • 2
  • 15
  • 18
  • Thank you decarlson, your explanation is very clear and helpful. I am new to PCA. This question itself is wrong and it should be deleted lol. I have a few questions and hopefully you can help me. 1) I have 35 PCs here, what is the criteria to select significant ones, contribution greater than say 5% for individual one, or select all PCs so that the summed proportion is greater than say 90%; 2) I try res.pca5 for coefficients. are these coefficient linear weight? and how should I understand + and - sign. Are they equally significant regardless of signs? – Harry Oct 14 '19 at 19:33
  • 3) I am using PCA to select relevant factors for further studies (Maybe this method is wrong). what is your opinion on selecting these relevant factors? selecting those with larger coefficient say >|20%| in the first say three or four significant PCs? Your help is very appreciated!!! – Harry Oct 14 '19 at 19:41
  • The answers to your questions depend on what your data is and what you are trying to find out. For example, pca can be used on raw data or standard scores and one approach is better than the other for certain questions. There are several books on principal components analysis and there is probably published literature on studies such as the one you are doing that could provide guides to the accepted/preferred practices in your field. – dcarlson Oct 14 '19 at 19:49
  • OK. I will do more search. Thanks – Harry Oct 14 '19 at 20:09