31

The following code produces bar plots with standard error bars using Hmisc, ddply and ggplot:

means_se <- ddply(mtcars,.(cyl),
                  function(df) smean.sdl(df$qsec,mult=sqrt(length(df$qsec))^-1))
colnames(means_se) <- c("cyl","mean","lower","upper")
ggplot(means_se,aes(cyl,mean,ymax=upper,ymin=lower,group=1)) + 
  geom_bar(stat="identity") +  
  geom_errorbar()

However, implementing the above using helper functions such as mean_sdl seems much better. For example the following code produces a plot with 95% CI error bars:

ggplot(mtcars, aes(cyl, qsec)) + 
  stat_summary(fun.y = mean, geom = "bar") + 
  stat_summary(fun.data = mean_sdl, geom = "errorbar")

My question is how to use the stat_summary implementation for standard error bars. The problem is that to calculate SE you need the number of observations per condition and this must be accessed in mean_sdl's multiplier.

How do I access this information within ggplot? Is there a neat non-hacky solution for this?

aleph4
  • 708
  • 1
  • 8
  • 15
  • 1
    Sorry, I don't quite understand what you mean when you write "you need number of observations per condition and this must be accessed in mean_sdl's multiplier". From `?smean.sdl`: "`mult` is the multiplier of the standard deviation used in obtaining a coverage interval about the sample mean. The default is mult=2 to use plus or minus 2 standard deviations". I assume you have seen all the examples [here](http://docs.ggplot2.org/current/stat_summary.html) on `stat_summary` and error bars, which seem to run 'automatically'. – Henrik Oct 08 '13 at 21:33
  • Standard error is SD divided by sqrt(n). As you can see the mult in my first code snippet does that to get standard error. However, in ggplot you don't have access to the N for each fold of the data-frame because this "summarization" is done internally. In ddply its easy to "manually" access the folds to query their length (n). How would you do this in stat_summary? – aleph4 Oct 08 '13 at 22:07
  • To be clear this argument would have to look something like this: stat_summary(fun.data = mean_sdl, mult = sqrt(length(df$qsec))^-1), geom = "errorbar"). Problem is I can't access df$qsec for each subset of mtcars to get the length – aleph4 Oct 08 '13 at 22:16

1 Answers1

70

Well, I can't tell you how to get a multiplier by group into stat_summary.

However, it looks like your goal is to plot means and error bars that represent one standard error from the mean in ggplot without summarizing the dataset before plotting.

There is a mean_se function in ggplot2 that we can use instead of mean_cl_normal from Hmisc. The mean_se function has a multiplier of 1 as the default so we don't need to pass any extra arguments if we want standard error bars.

ggplot(mtcars, aes(cyl, qsec)) + 
    stat_summary(fun.y = mean, geom = "bar") + 
    stat_summary(fun.data = mean_se, geom = "errorbar")

If you want to use the mean_cl_normal function from Hmisc, you have to change the multiplier to 1 so you get one standard error from the mean. The mult argument is an argument for mean_cl_normal. Arguments that you need to pass to the summary function you are using needs to be given as a list to the fun.args argument:

ggplot(mtcars, aes(cyl, qsec)) + 
    stat_summary(fun.y = mean, geom = "bar") + 
    stat_summary(fun.data = mean_cl_normal, geom = "errorbar", fun.args = list(mult = 1))

In pre-2.0 versions of ggplot2, the argument could be passed directly:

ggplot(mtcars, aes(cyl, qsec)) + 
  stat_summary(fun.y = mean, geom = "bar") + 
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", mult = 1) 
slhck
  • 36,575
  • 28
  • 148
  • 201
aosmith
  • 34,856
  • 9
  • 84
  • 118
  • Great! I was under the impression mean_cl_normal produced 95% CI. What is the default multiplier then if not 1? – aleph4 Oct 15 '13 at 18:37
  • 1
    Based on the help page for `smean.cl.normal` from `Hmisc`, it is the appropriate quantile from the t distribution on n-1 degrees of freedom based on the size of the confidence interval (which defaults to 95%). So it is `mult=qt((1+conf.int)/2,n-1)`. – aosmith Oct 15 '13 at 20:56
  • 4
    According to this [intro](http://www.ling.upenn.edu/~joseff/rstudy/summer2010_ggplot2_intro.html), `mean_cl_normal ` returns "sample mean and 95% confidence intervals assuming normality." – Michael Mauderer Mar 26 '15 at 14:32
  • @aosmith: thanks for your solution! It was working for me until I updated to the latest version of ggplot2 (2.2.0). Now, I cannot use mean_cl_normal to calculate standard error bars anymore. Has anyone gotten around this problem? – Sol Dec 24 '15 at 13:14
  • 2
    @SolLago I updated the answer to give a solution with the current version of *ggplot2* – aosmith Dec 24 '15 at 15:52
  • This is such an amazing functionality. Thanks for your answer! I just tried to reorganize it a little so that future readers find the most up-to-date info first. – slhck Feb 14 '19 at 13:46