3

I'm trying to create a stacked bar plot from raw data, where each set of factor variables potentially has multiple entries and the y-values should be the sum of all such entries. Doing a normal geom_bar at first looks fine, but it seems to plot each original entry as a separate rectangle stacked on each other. This looks okay, until you want to draw a frame around each part of the bar chart:

library(tidyverse)
data = tibble(
    age = factor(c(2, 3, 3, 3, 2, 2)),
    value = c(30, 5, 15, 14, 29, 9)
)
ggplot(data, aes(x = "Observation", y = value, fill = age)) +
    geom_bar(stat = "identity", colour = "black")

enter image description here

What I actually want is one frame around the turquoise and one rectangle around the red parts. How can I do this with ggplot directly?

Of course, one can manually call summarize:

ggplot(data %>% group_by(age) %>% summarize(value = sum(value)), 
  aes(x = "Observation", y = value, fill = age)) +
  geom_bar(stat = "identity", colour = "black")

enter image description here But that needs to be adjusted for each different selection of axes variables, which will be a pain, as I'm working with ~15 factor dimensions and have to create dozens of charts, with different factor variables for each of the axes (including facet_grid).

So ideally, ggplot / geom_bar would automatically do the aggregation and then draw the aggregated value rather than each individual entry separately. Is this possible?

1 Answers1

4

stat_summary() can handle these types of summaries on the fly.

You just specify the geometry and the function to summarize with. Here we also need to explicitly say to stack the bars to prevent overlap.

ggplot(data, aes(x = "Observation", y = value, fill = age,)) +
  stat_summary(geom = "bar", fun.y = "sum", position = "stack")

enter image description here

Nate
  • 10,361
  • 3
  • 33
  • 40
  • 2
    Wow, that was fast. It's probably worth mentioning that stat_summary passes on all arguments to the given geometry, son one can e.g. use colour="black" or width=0.8 in stat_summary and it will be handed to the geom_bar call. – Reinhold Kainhofer Jun 14 '19 at 13:20
  • 1
    Another thing: To get stacked percentage bars (similar to geom_bar(position = "fill", stat = "identity") ), one can simply use position="fill" instead of "stack". – Reinhold Kainhofer Jun 14 '19 at 14:10
  • @ReinholdKainhofer - you can make a similar graph with your original code if you had removed the ```colour``` argument. However, as you note, with this answer you can do ```stat_summary(..., colour = 'black')``` and get a black border around the two age groups. – Cole Jun 15 '19 at 12:39