0

I'm trying to plot PCA scores using ggbiplot but I can't due to a mismatch between my scores and my groupings. I think the mismatch stems from the NA values in my original log-transformed data (which I omit when I calculate the PCs). Is there a way to get around this so that I can plot using ggbiplot?

GCelem_trans.pca<-prcomp(na.omit(GCelem_trans.log), center=TRUE, scale=TRUE)
GCelem.rxtp <- GCelem[, 9]


g <- ggbiplot(GCelem_trans.pca, obs.scale = 1, var.scale = 1, 
                  groups = GCelem.rxtp, ellipse = TRUE, 
                  circle = TRUE)
    Error in `$<-.data.frame`(`*tmp*`, "groups", value = c(2L, 5L, 5L, 2L,  : 
      replacement has 33337 rows, data has 30804

Should I recalculate GCelem.rxtp based on a copy of GCelem that omits any rows with NAs?

val
  • 1,629
  • 1
  • 30
  • 56
  • I think the error is because you haven't subset `GCelem.rxtp` to match the NA-omitted data. You could find the index of the NA rows and subset `GCelem.rxtp` so that the correct values are removed from this, or you could impute the NAs in the original data (for example with the median of each variable). – Joe Jan 19 '17 at 11:15

1 Answers1

0

You can try with:

GCelem.rxtp <- GCelem[complete.cases(GCelem_trans.log), 9]
HubertL
  • 19,246
  • 3
  • 32
  • 51