5

I have a barplot where the exact bar heights are in the dataframe.

df <- data.frame(x=LETTERS[1:6], y=c(1:6, 1:6 + 1), g=rep(x = c("a", "b"), each=6))

ggplot(df, aes(x=x, y=y, fill=g, group=g)) + 
  geom_bar(stat="identity", position="dodge")

enter image description here

Now I want to add two hlines displaying the mean of all bars per group. All I get with

ggplot(df, aes(x=x, y=y, fill=g, group=g)) + 
  geom_bar(stat="identity", position="dodge") +
  stat_summary(fun.y=mean, aes(yintercept=..y.., group=g), geom="hline")

is

enter image description here

As I want to do this for a arbitrary number of groups as well, I would appreciate a solution with ggplot only.

I want to avoid a solution like this, because it does not rely purely on the dataset passed to ggplot, has redundant code and is not flexible in the number of groups:

ggplot(df, aes(x=x, y=y, fill=g, group=g)) + 
  geom_bar(stat="identity", position="dodge") +
  geom_hline(yintercept=mean(df$y[df$g=="a"]), col="red") +
  geom_hline(yintercept=mean(df$y[df$g=="b"]), col="green")

Thanks in advance!

Edits:

  • added dataset
  • comment on resulting code
  • changed the data and plots to clarify the question
c0bra
  • 1,031
  • 5
  • 22

1 Answers1

6

If I understand your question correctly, your first approach is almost there:

ggplot(df, aes(x = x, y = y, fill = g, group = g)) + 
  geom_col(position="dodge") + # geom_col is equivalent to geom_bar(stat = "identity")
  stat_summary(fun.y = mean, aes(x = 1, yintercept = ..y.., group = g), geom = "hline")

plot

According to the help file for stat_summary:

stat_summary operates on unique x; ...

In this case, stat_summary has inherited the top level aesthetic mappings of x = x and group = g by default, so it would calculate the mean y value at each x for each value of g, resulting in a lot of horizontal lines. Adding x = 1 to stat_summary's mapping overrides x = x (while retaining group = g), so we get a single mean y value for each value of g instead.

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • Hey! Just to chime in, when doing this with with a datetime variable on the x-axis, setting `x=1` or `x=NULL` or `x=lubridate::today()` all result in `Error: Invalid input: time_trans works with objects of class POSIXct only`. Any ideas? – Japhir Mar 30 '20 at 12:30
  • replace `x=1` with `x=as.Posixct("2020-01-01")` – Joost Keuskamp Dec 03 '21 at 13:56
  • or better even: replace `x=1` with `mean(.data[[x]],na.rm=TRUE)`, so that the reference point falls within your data – Joost Keuskamp Dec 03 '21 at 14:18
  • Why are dots added to `..y..` in `stat_summary`? – Ed_Gravy Sep 09 '22 at 12:46