R: pairs plot of one variable with the rest of the variables

Question

I would like to generate a correlation plot with my "True" variable pairs with all of the rest (People variables). I am pretty sure this has been brought up somewhere but solutions I have found do not work for me.

library(ggplot2)
set.seed(0)

dt = data.frame(matrix(rnorm(120, 100, 5), ncol = 6) )
colnames(dt) = c('Salary', paste0('People', 1:5))
ggplot(dt, aes(x=Salary, y=value)) +
  geom_point() + 
  facet_grid(.~Salary)

Where I got error: Error: Column y must be a 1d atomic vector or a list.

I know one of the solutions is writing out all of the variables in y - which I am trying to avoid because my true data has 15 columns.

Also I am not entirely sure what do the "value", "variables" refer to in the ggplot. I saw them a lot in demonstrating codes.

Any suggestion is appreciated!

`y = value` has no meaning as there is no `value` column in your `dt`. Are you trying to plot salary amounts against number of people in different groups? — Croote, Mar 01 '19 at 04:03
what I mean is would you like Salary vs. People 1 | Salary vs. Peopl2 ... and so on? — Croote, Mar 01 '19 at 04:08

Tung · Accepted Answer · 2019-03-01T08:00:20.087

You want to convert your data from wide to long format using tidyr::gather() for example. Here is a solution using packages in the tidyverse framework

library(tidyr)
library(ggplot2)
theme_set(theme_bw(base_size = 14))

set.seed(0)
dt = data.frame(matrix(rnorm(120, 100, 5), ncol = 6) )
colnames(dt) = c('Salary', paste0('People', 1:5))

### convert data frame from wide to long format
dt_long <- gather(dt, key, value, -Salary)
head(dt_long)
#>      Salary     key     value
#> 1 106.31477 People1  98.87866
#> 2  98.36883 People1 101.88698
#> 3 106.64900 People1 100.66668
#> 4 106.36215 People1 104.02095
#> 5 102.07321 People1  99.71447
#> 6  92.30025 People1 102.51804

### plot
ggplot(dt_long, aes(x = Salary, y = value)) +
  geom_point() +
  facet_grid(. ~ key)

### if you want to add regression lines
library(ggpmisc)

# define regression formula
formula1 <- y ~ x

ggplot(dt_long, aes(x = Salary, y = value)) +
  geom_point() +
  facet_grid(. ~ key) +
  geom_smooth(method = 'lm', se = TRUE) +
  stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~")), 
               label.x.npc = "left", label.y.npc = "top",
               formula = formula1, parse = TRUE, size = 3) +
  coord_equal()

### if you also want ggpairs() from the GGally package
library(GGally)
ggpairs(dt)

^{Created on 2019-02-28 by the reprex package (v0.2.1.9000)}

jay.sf · Answer 2 · 2019-03-01T07:23:28.943

0

You need to stack() your data first, probably that's what you have "seen".

dt <- setNames(stack(dt), c("value", "Salary"))

library(ggplot2)
ggplot(dt, aes(x=Salary, y=value)) +
  geom_point() + 
  facet_grid(.~Salary)

Yields

edited Mar 01 '19 at 07:23

answered Mar 01 '19 at 07:16

jay.sf

60,139
8
53
110

1

Thanks Jay, that's really helpful. I have another quick question: is it possible to display the graphs into various rows? For instance, for the graph above, only display two graphs per row, then two more the next row, then the last graph. – Rachel Zhang Mar 01 '19 at 14:01
Yes, you could take a look into `grid.arrange` as described in [this](https://stackoverflow.com/a/45614824/6574038) answer. – jay.sf Mar 01 '19 at 15:28

R: pairs plot of one variable with the rest of the variables

2 Answers2

Linked