11

I have a grouped boxplot using data with 3 categories. One category is set as the x-axis of the boxplots, the other is set as the fill, and the last one, as a faceting category. I want to display the means for each fill group, but using stat_summary only gives me the mean for the x-axis category, without separating the means for the fill:

facetted boxplots

Here is the current code:

demoplot<-ggplot(demo,aes(x=variable,y=value))
demoplot+geom_boxplot(aes(fill=category2),position=position_dodge(.9))+
stat_summary(fun.y=mean, colour="black", geom="point", shape=18, size=4,) +
facet_wrap(~category1)

Is there any way to display the mean for each category2 without having to manually compute and plot the points? Adjusting the position dodge doesn't really help, as it's just one computed mean. Would creating conditions within the mean() function be advisable?

For anyone interested, here's the data:

Advanced thanks for any enlightenment on this.

alistaire
  • 42,459
  • 4
  • 77
  • 117
dizzygirl
  • 315
  • 2
  • 4
  • 12
  • your link does not work. geom_box() allows you to compute your own stats (http://docs.ggplot2.org/dev/geom_boxplot.html) – MLavoie Mar 19 '16 at 09:07
  • @MLavoie is the link dead? Not sure why, I used a bit.ly to https://www.dropbox.com/s/mlvx0hu3rwuxtgj/demo.csv?dl=0 I see, are you suggesting I use the `stat` within the geom_boxplot()? – dizzygirl Mar 19 '16 at 09:22
  • if you scroll down you will see that example (just adapt for your example): y <- rnorm(100) df <- data.frame( x = 1, y0 = min(y), y25 = quantile(y, 0.25), y50 = median(y), y75 = quantile(y, 0.75), y100 = max(y) ) ggplot(df, aes(x)) + geom_boxplot( aes(ymin = y0, lower = y25, middle = y50, upper = y75, ymax = y100), stat = "identity" ) – MLavoie Mar 19 '16 at 09:23
  • I am not sure about this one; are you recommending that it would be easier to manually create the box plots one by one, by inputting the quartiles in the code? @MLavoie – dizzygirl Mar 19 '16 at 09:37
  • here is an example of what I mean http://stackoverflow.com/questions/34081405/how-to-reproduce-geom-boxplot-default-whiskers-with-stat-identity – MLavoie Mar 19 '16 at 09:48
  • Ohh, I see. Thanks @MLavoie , this is a good method of re-creating the boxplot taking the outliers and whiskers into consideration, but is there any way to overlay the means per fill category above the box plot? I only figured out how to do it for the x-variable. – dizzygirl Mar 19 '16 at 09:58
  • The actual width in _data units_ of the `geom_point`s to be dodged is zero. You need to explicitly _set_ a virtual dodge width of the points as well. You want the dodge width to be the same for for the boxes and the points, i.e. use `position_dodge(width = 0.9)` in `stat_summary` as well. See [this answer](http://stackoverflow.com/questions/34889766/what-is-the-width-argument-in-position-dodge/35102486#35102486) for a thorough explanation. – Henrik Mar 19 '16 at 10:00
  • Thanks @Henrik , I tried varying the width of `position=position_dodge` within `stat_summary` , but nothing's changed. I suspect that there should one line of `stat_summary` for each category 2, but I do not know how to go about doing that. – dizzygirl Mar 19 '16 at 10:07
  • You need to move `fill` to `ggplot` as well. Then this level of grouping is inherited to `stat_summary` as well. – Henrik Mar 19 '16 at 10:21
  • @Henrik I tried moving `fill=category2` to `ggplot` and it worked just as fine. You are right, putting it in `ggplot` overrides anything else. Thank you! :) – dizzygirl Mar 19 '16 at 10:28

1 Answers1

21

Ggplot needs to have explicit information on grouping here. You can do that either by using a aes(group=....) in the desired layer, or moving the fill=... to the main call to ggplot. Without explicit grouping for a layer, ggplot will group by the factor on the x-axis. Here's some sample code with fake data:

library(ggplot2)
set.seed(123)

nobs <- 1000
dat <- data.frame(var1=sample(LETTERS[1:3],nobs, T),
                  var2=sample(LETTERS[1:2],nobs,T),
                  var3=sample(LETTERS[1:3],nobs,T),
                  y=rnorm(nobs))

p1 <- ggplot(dat, aes(x=var1, y=y)) +
  geom_boxplot(aes(fill=var2), position=position_dodge(.9)) +
  facet_wrap(~var3) +
  stat_summary(fun.y=mean, geom="point", aes(group=var2), position=position_dodge(.9), 
               color="black", size=4)

enter image description here

Heroka
  • 12,889
  • 1
  • 28
  • 38
  • Awesome, @Heroka ! So the key is to add `aes(group=category2)` and match the `position_dodge` width. Thank you, this worked perfectly! :) – dizzygirl Mar 19 '16 at 10:20