I want to do a simple correlation analysis with ggcorplot
and ggpairs
. However, I want to compare a group of continuous variables to another group only, not all against all. Let me explain what I mean below.
I am using the mtcars
data for this example. I do the following:
data(mtcars)
mtcars
summary(mtcars)
mtcars$vs <- as.character(mtcars$vs)
grDevices::pdf(file="test1.pdf", height=6, width=6)
print(
ggcorrplot::ggcorrplot(round(cor(mtcars[,1:7]), 1), p.mat = ggcorrplot::cor_pmat(mtcars[,1:7]),
hc.order=TRUE, type='lower', method = "circle")
)
grDevices::dev.off()
grDevices::pdf(file="test2.pdf", height=10, width=10)
print(
GGally::ggpairs(mtcars, columns=1:7, ggplot2::aes(colour=vs), legend = 1,
lower = list(continuous = GGally::wrap("smooth", alpha = 0.5, size=2)),
upper = list(continuous = GGally::wrap("cor", size=3.5))) +
ggplot2::theme_light() + ggplot2::theme(legend.position = "bottom")
)
grDevices::dev.off()
Which produces the following plots:
However, I do not want all variables against all, what I would really like here would be to compare variables mpg
, cyl
, disp
, hp
(on the x axis) versus variables drat
, wt
, qsec
(on the y axis).
Ideally, we would not be cropping the plot, but rather doing only the appropriate calculations to save time. In fact, imagine my starting data is separated in 2 input data frames with the different continuous variables (but the same rownames and categorical variables, in this case using vs
), so more like this:
mydata1 <- mtcars[,c(8,1:4)]
mydata2 <- mtcars[,c(8,5:7)]
> head(mydata1)
vs mpg cyl disp hp
Mazda RX4 0 21.0 6 160 110
Mazda RX4 Wag 0 21.0 6 160 110
Datsun 710 1 22.8 4 108 93
Hornet 4 Drive 1 21.4 6 258 110
Hornet Sportabout 0 18.7 8 360 175
Valiant 1 18.1 6 225 105
> head(mydata2)
vs drat wt qsec
Mazda RX4 0 3.90 2.620 16.46
Mazda RX4 Wag 0 3.90 2.875 17.02
Datsun 710 1 3.85 2.320 18.61
Hornet 4 Drive 1 3.08 3.215 19.44
Hornet Sportabout 0 3.15 3.440 17.02
Valiant 1 2.76 3.460 20.22
Any idea how to approach this? Not sure ggcorrplot
and ggpairs
are the best for this, but I would stick with them if possible just cause I am more familiar, or at least something that produces a similar ggplot2
-like output. Thanks!