I am trying to conduct a Principal Components Regression Analysis (PCR) in R. Usually I would do a PCA (Principal Components Analysis), however I have multi-collinearity and have read that PCR can handle this.
I am using the pcr
function from the pls
package. This requires a formula to identify the variables to be compared. I want to be able to compare every variable against every other variable, the way a PCA does. However in this function I can only figure out how to compare one variable against every other variable, and depending on which variable I choose, the result changes. Of course, it is possible I am not understanding PCR correctly.
Here is an example using the iris
data set.
library(pls)
library(ggplot2)
Comparing Petal.Length
to all other variables:
ir.pcr<-pcr(Petal.Length~ ., data = iris, validation = "CV")#PCR comparing `Petal.Length` with all other variables
df<-data.frame(ir.pcr$scores[,1],ir.pcr$scores[,2])#get first 2 COMP scores from PCR for ggplot
colnames(df)<-c('Comp1', 'Comp2')
ggplot(data=df,aes(x=Comp1,y=Comp2)) +
geom_point(aes(fill=iris$Species),shape=21,colour='black',size=3)#plot points
Using Sepal.Width
compared to every other variable:
ir.pcr<-pcr(Sepal.Width~ ., data = iris, validation = "CV")#PCR
df<-data.frame(ir.pcr$scores[,1],ir.pcr$scores[,2])#get first 2 COMP scores from PCR for ggplot
colnames(df)<-c('Comp1', 'Comp2')
ggplot(data=df,aes(x=Comp1,y=Comp2)) +
geom_point(aes(fill=iris$Species),shape=21,colour='black',size=3)#plot points
My understanding is that including a .
after ~
in a formula means 'compare to everything else'. If this is so, then how can I essentially have .~.
to be able to compare every variable to every other variable?