PCA is an unsupervised method that is used for descriptive modeling rather than predictive modeling. Therefore we don't usually think of projecting data with PCA as training per se. However it is possible to define a PCA space with one dataset and ask where new data falls into that same space. You were on the right track using the rotations (p$var$coord
) to rotate the new data matrix with %*%
.
Note you have to be careful to apply the same scaling and centering on your new data. This is also discussed here.
Here's an example with the iris
dataset where we define a PCA projection with half the data and then project the other half into that PCA space by %*%
ing by the rotation.
library(tidyverse)
library(FactoMineR)
# split data
set.seed(1)
splits <- sample(nrow(iris), nrow(iris)/2)
train <- iris[splits,]
test <- iris[-splits,]
# build PCA rotation based on 'training' data
p <- train[,-5] %>% PCA(graph = F)
# projection of training data
p$ind$coord %>%
as.data.frame() %>%
bind_cols(., Species = train[,5]) %>%
ggplot(aes(Dim.1, Dim.2)) +
ggtitle("original points") +
geom_point(aes(color = Species))

# scale and project 'test' data into original PCA space
test[,-5] %>%
scale(center = p$call$centre, scale = p$call$scale.unit) %*% p$var$coord %>%
as.data.frame() %>%
bind_cols(., Species = test[,5]) %>%
ggplot(aes(Dim.1, Dim.2)) +
ggtitle("projection of new points into original space") +
geom_point(aes(color = Species))

Created on 2022-02-12 by the reprex package (v2.0.1)