1

I'm trying to create a barplot with confidence interval error bars using ggplot. Essentially, I have a variable, Q1, with 7 answer options, and I want to plot the percent of respondents for each option, as a factor of two groups (One and Two) - the percent of subjects in each group that selected each of the 7 answer option.

I've tried adding y= count, y=prop or y=..prop.. to aes in ggplot, but it neither seem to work. Any suggestions are appreciated.

df5 <- filter(df, Q1!="-99",df$Group=="One"|df$Group=="Two") 

ggplot(data = df5, aes(x = Q1)) + 
 stat_summary(fun.y = mean, geom = "bar") +
 stat_summary(fun.data = mean_cl_boot, geom = "errorbar", fun.args = list(mult = 1)) +
    geom_bar(aes(label= scales::percent(..prop..),
                   y= ..prop..,fill = df5$Group), position = "dodge")

Error: stat_summary requires the following missing aesthetics: y.

I'm essentially trying to get something that looks like this, with the error bars representing confidence intervals.

example of barplot with error bars.

IG114
  • 85
  • 2
  • 10

1 Answers1

1

Please note that there is a better way to write your first selection:

df5 <- df %>% filter(Q1!="-99", Group %in% c("One", "Two"))

I recommend you to compute the stats explicitly before making the graph. function DescTools::MultinomCI() will do the job (cf documentation)

# Reproducible example: random
library(tidyverse)
n <- 1000
df5 <- tibble(
    Q1 = sample(letters[1:7], n, replace=TRUE),
    Group = sample(c("One","Two"), n, replace=TRUE)
    )

library(DescTools)
df_stats <- df5 %>% 
    count(Group, Q1) %>% 
    group_by(Group) %>% 
    do({
        df_grp <- .
        df_grp %>% 
            select(Q1, n) %>%
            bind_cols(as_tibble(MultinomCI(df_grp$n))) %>% 
            rename(prop = est)
    })

If you want to use bar plots:

df_stats %>% 
    ggplot(aes(Q1, y=prop, ymin=lwr.ci, ymax=upr.ci, fill=Group)) + 
    geom_col(position="dodge") + 
    geom_errorbar(position="dodge") + 
    ylim(0, NA)

(Note that axes of barplots should always start from zero, hence the use of ylim)

However, in order to underline between-group differences in the answers, a line plot will be much more readable:

df_stats %>% 
    ggplot(aes(Q1, y=prop, ymin=lwr.ci, ymax=upr.ci, color=Group, group=Group)) + 
    geom_line() + 
    geom_errorbar(position="dodge", width=.2) + 
    ylim(0, NA)

resulting plot, using geom line

Pierre Gramme
  • 1,209
  • 7
  • 23
  • 1
    Thank you @pierre Gramme, but creating the barplot isn't the problem. The problem is when i try to add the error bars, with confidence intervals. Your solution doesn't help fix the stat_summary error. – IG114 Oct 28 '19 at 13:24
  • `stat_summary` is meant to compute per-group statistics with one grouping variable and one numerical variable of interest (e.g. average and +-2 std of age per answer to Q1). In this case you are only interested in the frequence of the grouping variable. See above for a solution – Pierre Gramme Oct 28 '19 at 14:32
  • Thank you! when i try creating df_stats as above i get the following error: ```Error in if (b < 0) b <- 0 : missing value where TRUE/FALSE needed ``` – IG114 Oct 28 '19 at 15:06
  • Strange. Does it work with the dummy example I wrote? – Pierre Gramme Oct 28 '19 at 15:31
  • 1
    No, running your example gives me ```Error: stat_count() must not be used with a y aesthetic.``` – IG114 Oct 28 '19 at 16:24
  • my bad: error in the copy-paste for the last code block (now fixed, using geom line) – Pierre Gramme Oct 28 '19 at 17:40
  • The means of what ? E.g. mean age of participants? Then stat_summary, just as you were trying initially – Pierre Gramme Nov 02 '19 at 07:56