0

Plotting an average line in ggplot.

I have the following data;

structure(list(Region.in.country = c("Andalucia", "Aragon", "Asturias", 
"Canary Islands", "Cantabria", "Castilla-La Mancha", "Castilla y Leon", 
"Cataluna", "Comunidad Valenciana", "Extremadura", "Galicia", 
"Islas Baleares", "La Rioja", "Madrid", "Murcia", "Navarra", 
"Pais Vasco"), count = c(540L, 117L, 74L, 362L, 36L, 150L, 299L, 
952L, 797L, 72L, 283L, 353L, 39L, 1370L, 302L, 46L, 255L)), .Names = c("Region.in.country", 
"count"), row.names = c(NA, -17L), class = c("tbl_df", "tbl", 
"data.frame"), na.action = structure(18L, .Names = "18", class = "omit"))

I am trying to add an average line across the bar plot in ggplot 2. The average line is the avergae of the count column over the 17 regions.

sum(region$count) / 17

 ggplot(data = region, aes(x = Region.in.country, y = count)) +
   geom_bar(stat="identity") +
   geom_line(data = region, aes(355.7059)) +
   coord_flip()

The above code returns an error

EDIT:

enter image description here

user113156
  • 6,761
  • 5
  • 35
  • 81
  • 1
    I am not getting any error with your data – akrun Dec 28 '17 at 19:59
  • 3
    try `geom_hline(aes(yintercept = 355.7059))` instead of `geom_line(...)`. (alas, this task is not as quixotic as it appears (one can't help but notice La Mancha in the `Region.in.county` column).) – bouncyball Dec 28 '17 at 20:02
  • really... I have edited the original post and included my graphic – user113156 Dec 28 '17 at 20:03
  • Great! thanks bouncyball - it worked perfect! One additional question, is it possible to put the equation in the geom_hline. E.G. `geom_hline(aes(yintercept = sum(region$count) / nrow(region)))` – user113156 Dec 28 '17 at 20:06
  • 1
    Since you specified `data` in your `ggplot` call, you can succintly type: `geom_hline(aes(yintercept = mean(count)))` – bouncyball Dec 28 '17 at 20:07
  • Exactly what I wanted, thanks! I tried the same thing but with `geom_line` which gave me the errors – user113156 Dec 28 '17 at 20:08

2 Answers2

4

This should do the job. Credit to bouncyball for suggesting aes(yintercept = mean(count)) instead of yintercept = 355.7059

  ggplot(region, aes(x= reorder(Region.in.country, count), count))+
    geom_bar(stat ="identity")+
    coord_flip()+
    xlab("Region")+
    ylab("Counts")+
    geom_hline(aes(yintercept = mean(count)))  

enter image description here

If you want to create an ordered bar plot (by the numeric value), always remember to use reorder() on the column beforehand. It'll be unsorted otherwise even if you use arrange() or sort() to sort the data before plotting it. If you don't use reorder() on it, it'll be sorted by the corresponding id variable, Region.in.country in alphabetical order (as shown in the other answer posted after this one).

InfiniteFlash
  • 1,038
  • 1
  • 10
  • 22
  • 1
    Yes this works perfect also. Thank you. I like bouncyball solution of adding `geom_hline(aes(yintercept = mean(count)))` instead of the number 355.7059 – user113156 Dec 28 '17 at 20:11
  • Let me edit my post to accommodate for your adjustment (will give credit at top). Nice. – InfiniteFlash Dec 28 '17 at 20:11
  • Another interesting answer would be to check the `stat_summary()` function. As in `ggplot2::stat_summary(fun.y = mean, ggplot2::aes(x = 1, yintercept = ..y..), geom = "hline")`. More details can be found [on this SO question](https://stackoverflow.com/questions/52330348/plot-hline-at-mean-with-geom-bar-and-stat-identity) – xav Jun 14 '19 at 10:49
0

Use geom_hline as follows:

avg <- mean(region$count)

region %>%
  ggplot(aes(x = Region.in.country, y = count)) +
  geom_bar(stat="identity") +
  geom_hline(aes(yintercept = avg)) +
  coord_flip()

result of the code

Tamás Sengel
  • 55,884
  • 29
  • 169
  • 223
AlphaDrivers
  • 136
  • 4