119

Is it possible to group by two columns? So the cross product is drawn by geom_point() and geom_smooth()?

As example:

frame <- data.frame(
 series <- rep(c('a', 'b'), 6), 
 sample <- rep(c('glass','water', 'metal'), 4), 
 data <- c(1:12))

ggplot(frame, aes()) # ...

Such that the points 6 and 12 share a group, but not with 3.

Captain Obvious
  • 745
  • 3
  • 17
  • 39
Reactormonk
  • 21,472
  • 14
  • 74
  • 123

3 Answers3

227

Taking the example from this question, using interaction to combine two columns into a new factor:

# Data frame with two continuous variables and two factors 
set.seed(0)
x <- rep(1:10, 4)
y <- c(rep(1:10, 2)+rnorm(20)/5, rep(6:15, 2) + rnorm(20)/5)
treatment <- gl(2, 20, 40, labels=letters[1:2])
replicate <- gl(2, 10, 40)
d <- data.frame(x=x, y=y, treatment=treatment, replicate=replicate)

ggplot(d, aes(x=x, y=y, colour=treatment, shape = replicate,
  group=interaction(treatment, replicate))) + 
  geom_point() + geom_line()

ggplot example

Community
  • 1
  • 1
Blue Magister
  • 13,044
  • 5
  • 38
  • 56
65

for example:

 qplot(round, price, data=firm, group=id, color=id, geom='line') +  
      geom_smooth(aes(group=interaction(size, type)))
Blue Magister
  • 13,044
  • 5
  • 38
  • 56
Davoud Taghawi-Nejad
  • 16,142
  • 12
  • 62
  • 82
39

Why not just paste those two columns together and use that variable as groups?

frame$grp <- paste(frame[,1],frame[,2])

A somewhat more formal way to do this would be to use the function interaction.

joran
  • 169,992
  • 32
  • 429
  • 468
  • 30
    I think you shouldn't modify your `data.frame` for the purpose of a plot. The `plot` should plot your df and not the opposite. – ClementWalter Jun 03 '16 at 11:54
  • 4
    I agree, Blue Magister's answer is better. – Jeston May 14 '17 at 12:16
  • 9
    @clemlaflemme I think BlueMagister's answer is fine, although I think the distinction in this case is quite minor. But the general position that one should not modify your data frame for a plot is a curious one given your choice to use **ggplot2**, the entire design of which is premised on explicitly structuring your data to work with ggplot's semantics. – joran Aug 28 '17 at 01:28
  • A disadvantage of `paste` is that when input is a factor, it discards the levels, where `interaction` preserves the order of the original factors. This means that the groups are more naturally ordered with `interaction` approach. – Kota Mori Nov 09 '19 at 23:38
  • 1
    A disadvantage of `interaction` is that it can drop `NA`'s. Consider the example: `paste(c("a","b", NA, NA), c(1,2,1,2))` which results in four different grouping variables: `"a 1" "b 2" "NA 1" "NA 2"`, while `interaction(c("a","b", NA, NA), c(1,2,1,2))` generates only three grouping variables. – fabern May 03 '21 at 14:06
  • you can use `paste` within the `aes(group = paste(treatment, replicate))` and never modify `d` – Nate Jul 24 '23 at 11:44