2

I wanted to build on the great answer I got to a previously asked question:

Graph proportion within a factor level rather than a count in ggplot2

I was hoping to build on the code:

var1 <- c("Left", "Right", NA, "Left", "Right", "Right", "Right", "Left", "Left", "Right", "Left", "Left","Left", "Right", "Left", "Right", "Right", "Right", "Left", "Left", "Right", NA, "Left", "Left","Left", "Right", NA, "Left", "Right", "Right", "Right", "Left", "Left", "Right", "Left", "Left","Left", "Right", "Left", "Right", "Right", "Right", "Left", "Left", "Right", NA, "Left", "Left")
var2 <- c("Higher", "Lower", NA, "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", "Slightly higher","Higher", "Lower", "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", NA, "Slightly lower","Higher", "Lower", NA, "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", "Slightly higher","Higher", "Lower", "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly lower", "Higher", "Higher", "Higher", NA, "Slightly lower")
df <- as.data.frame(cbind(var1, var2))

library(dplyr)
library(ggplot2)

df %>%
  na.omit() %>%
  group_by(var1, var2) %>%
  summarise(n = n()) %>%
  mutate(n = n/sum(n)) %>%
  ungroup() %>%
  ggplot() + aes(var2, n, fill = var1) + 
  geom_bar(position = "dodge", stat = "identity") + 
  labs(x="Left or Right",y="Count")+
  scale_y_continuous() +
  scale_fill_discrete(name = "Answer:")+ theme_classic()+ 
  theme(legend.position="top")  +
  scale_fill_manual(values = c("black", "red"))

To add error bars in the form of 95% confidence intervals to each bar on my graph. I have tried to add in the term

upperE=(1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n), lowerE=(-1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n).

But alas I keep getting errors...

I also tried making an entirely new dataframe for the graph, thus:

var1 <- c("Left", "Right", NA, "Left", "Right", "Right", "Right", "Left", "Left", "Right", "Left", "Left","Left", "Right", "Left", "Right", "Right", "Right", "Left", "Left", "Right", NA, "Left", "Left","Left", "Right", NA, "Left", "Right", "Right", "Right", "Left", "Left", "Right", "Left", "Left","Left", "Right", "Left", "Right", "Right", "Right", "Left", "Left", "Right", NA, "Left", "Left")
var2 <- c("Higher", "Lower", NA, "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", "Slightly higher","Higher", "Lower", "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", NA, "Slightly lower","Higher", "Lower", NA, "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", "Slightly higher","Higher", "Lower", "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly lower", "Higher", "Higher", "Higher", NA, "Slightly lower")
df <- as.data.frame(cbind(var1, var2))



dat <- df %>%
  na.omit() %>%
  group_by(var1, var2) %>%
  summarise(n = n()) %>%
  mutate(prop = n/sum(n),upperE=1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n, lowerE=-1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n)

test <- ggplot(dat, aes(x=var2, y = prop, fill = var1))+ 
  geom_bar(position = "dodge", stat = "identity") + geom_errorbar(aes(ymin = lowerE, ymax = upperE),position="dodge")+
  labs(x="Answer",y="Proportion")+
  scale_fill_discrete(name = "Condition:")+ theme_classic()+ 
  theme(legend.position="top") 

Which gives me error bars but positioned at 0 on the Y-axis not on top of each bar...

enter image description here

Does anyone have any suggestions? Thank you!

Sarah
  • 789
  • 3
  • 12
  • 29

3 Answers3

1

I have now worked out how to get the error bars to sit at the appropriate position on each bar - I needed to associate the ymin and ymax specification of the error bar with the values being plotted, thus:

dat <- df %>%
  na.omit() %>%
  group_by(var1, var2) %>%
  summarise(n = n()) %>%
  mutate(prop = n/sum(n),upperE=1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n, lowerE=-1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n)

test <- ggplot(dat, aes(x=var2, y = prop, fill = var1))+ 
  geom_bar(position = "dodge", stat = "identity") + geom_errorbar(aes(ymin = prop+lowerE, ymax = prop+upperE),width = .2, position=position_dodge(.9))+
  labs(x="Answer",y="Proportion")+
  scale_fill_discrete(name = "Condition:")+ theme_classic()+ 
  theme(legend.position="top") 

Which gave:

enter image description here

Sarah
  • 789
  • 3
  • 12
  • 29
0

The formula for the SE in the 95%CI in proportions is: se = sqrt((p * (1-p))/n. So I think in the solution above it is stated: sqrt(n/sum(n) * 1-(n/sum(n))/n). However, n there is only the count of successes. The full sample is sum(n). So it actually should be sqrt(n/sum(n) * (1-(n/sum(n))/**sum**(n)).

Timus
  • 10,974
  • 5
  • 14
  • 28
4352sdf
  • 1
  • 1
0

Super old thread, but just in case somebody still stumbles upon this: the formula for the confidence intervals in the upvoted answer is incorrect.

It should be:

mutate(prop = n/sum(n),
         upperE=1.96*sqrt(n/sum(n)*(1-(n/sum(n)))/sum(n)), 
         lowerE=-1.96*sqrt(n/sum(n)*(1-(n/sum(n)))/sum(n)))

. With the formula that you used for the confidence intervals, you only take the square root of the first bit of the formula. However, you need to take the square root of the entire formula (except for the Z score).