I applied PCA to my biomedical data (31 genes~rows and 1904 patients~columns) with a selection of 9 components. As a result, I have two sub-matrices in which one is a 9 by 1904 matrix (I call it matrix A).
Matrix A presents its rows are 9 components, its columns are 1904 patients, and its entries are continuous values. Now I want to find out which component out of those 9 components possesses only a single patient out of 1904 patients who accounts for >10% of the variance compared to others (possibly consider this patient as an outlier in this component). At last, I plan to remove these identified components.
For example, I compute variance of patients within each component. Then I realize that Component 3 possesses a patient out of the 1904 patients who accounts for >10% of variance compared to others => I consider that this component includes an outlier. I remove component 3 from my components
I am stuck with doing it in R. Any idea is appreciated! Thanks in advance.
UPDATE: The following are my attempts:
Dummy data df
presents 10 patients~rows and 3 components~columns
df=structure(c(-0.17134779227884, -0.0962044733094678, 0.0683562125182872,
-0.243465849606547, 0.333327443120999, -0.124616446710062, 0.213423949350221,
-0.086118378436248, 0.209279578622201, 0.425834454279314, 0.16728832317405,
0.952243725136014, -0.101114176191555, 0.187773366984759, 0.207570066964501,
-0.117920965767025, 0.939250613987857, -0.00465861655152568,
-0.288348010784738, 0.0469224124443503, -0.165934907003698, -0.18339647933408,
-0.098550778268536, -0.094031840482207, 0.0759839405752319, -0.141524045263773,
-0.0665849661695848, -0.442355221875939, -0.156962689636778,
-0.142727471861712), .Dim = c(10L, 3L), .Dimnames = list(c("MB-0362",
"MB-0346", "MB-0386", "MB-0574", "MB-0503", "MB-0641", "MB-0201",
"MB-0218", "MB-0316", "MB-0189"), c("comp 1", "comp 2", "comp 3"
)))
I try to compute variance of each patient contributes to within each of three components
df1 = as.data.frame(df)
df1$Patients = rownames(df)
df1 = as.data.frame(df1) %>%
pivot_longer(-Patients, names_to = "Component", values_to = "Weight") %>%
group_by(Component) %>%
mutate(var = var(Weight))
Now I must compute percentage of variance of each patient contribute to each component. The problem that I am stuck with this :(