0

Here is some example data:

gender <- c("male", "female", "male", "male", "female", "female", "male", "female", "female", "male")
outcome <- factor(c(0,0,0,1,1,1,0,1,1,1), levels = c(0,1), labels = c("responders", "non-responders"))      
df <- c(gender, outcome)

I wish to create a ggplot where on the y axis is the percentage, x axis is the gender and the fill is the outcome. It has to be a stacked bar with percentage labels within.

Tried this code here:

ggplot (df, aes (x = gender, fill = outcome)) + geom_bar()

But this gives me the count in the y-axis. I wish to create the percentage on the y-axis. The stacked female bar must indicate the percentage of females with the "responder and non-responder outcome within the female group" as opposed to the percentage of females of the total population that respond or do not respond. E.g., I would like to see 40% female responders vs 60% non-responders and similar for males.

To make this ready for publication I also need to add labels of these percentages in the stacked bar.

Pashtun
  • 123
  • 7

3 Answers3

1

Here for the labels:

library(ggplot2)
gender <- c("male", "female", "male", "male", "female", "female", "male", "female", "female", "male")
outcome <- factor(c(0,0,0,1,1,1,0,1,1,1),  labels = c("responders", "non-responders"))      
df <- data.frame(gender, outcome)

ggplot(df, aes(x= gender)) + 
  geom_bar(aes(y = 2*(..count..)/sum(..count..), fill = outcome, group=outcome), stat="count") +
  geom_label(aes(label = scales::percent(2*(..count..)/sum(..count..)),
                  group = outcome), position = "fill", stat= "count", vjust = 0) +
  labs(y = "Percent", fill="outcome") +
  scale_y_continuous(labels = scales::percent)

enter image description here

It seems that @Paul has a better way for the geom_bar.

EDIT

Here is a general solution:

library(ggplot2)
gender <- c("female", "female", "male", "male", "female", "female", "male", "female", "female", "male")
outcome <- factor(c(0,0,0,1,1,1,0,1,1,1),  labels = c("responders", "non-responders"))      
df <- data.frame(gender, outcome)

gg <- ggplot() + 
  geom_bar(aes(x= gender, fill = outcome), data = df, position = "fill")
ggb <- ggplot_build(gg)
df2 <- data.frame(y = ggb$data[[1]][["y"]])

gg + geom_label(
  aes(x = rep(c(1,2), each = 2), label = scales::percent(y), y = y), 
  data = df2
)
Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225
  • Thank you for your response, however, this is slightly different from what I want. I will edit my question to make it clearer. But I want the percentages to show of the "male" and the "female" instead of the percentage of the total. E.g. 40% of the female are responders, 60% are non-responders, shown in the "female" stack. – Pashtun Aug 19 '21 at 13:33
  • @StéphaneLaurent your answer deals better with labelling the bars, I still struggle to make it works :p – Paul Aug 19 '21 at 14:10
  • @Paul I don't like the `2*` because it works for the particular reason that there are as many males than females. – Stéphane Laurent Aug 19 '21 at 14:20
  • @StéphaneLaurent thanks for editing your answer. It's a nice addition, however, as you noted yourself, the female/men are exactly the same in the example dataset but not in my real dataset, which gives the wrong values for the labels. Since I have many plots to make (20+) it would be great if there is a 'general code' to use instead of manually calculating the ratio between men:female (which will differ depending on the available data of the outcome). – Pashtun Aug 19 '21 at 14:43
  • @Pashtun I searched a general solution but I didn't found yet. – Stéphane Laurent Aug 19 '21 at 14:46
  • @Pashtun See my edit. I don't see another way. – Stéphane Laurent Aug 19 '21 at 15:09
  • @StéphaneLaurent found and posted a solution using a combination of both of your and Paul's answer togheter with something I found here: https://stackoverflow.com/questions/24776200/ggplot-replace-count-with-percentage-in-geom-bar – Pashtun Aug 19 '21 at 18:18
0

The trick not to have to change the data is to use geom_bar(position = "fill") as mentionned here: https://stackoverflow.com/a/48602277/10264278. To format the labels of the y-axis, you have multiple choices. Here are two of them:

  • use the scales package scales::percent_format()
  • use a custom function instead, just replace the above code with function(x) paste0(x*100, "%")

And here it is:

gender <- c("male", "female", "male", "male", "female", "female", "male", "female", "female", "male")
outcome <- factor(c(0,0,0,1,1,1,0,1,1,1), levels = c(0,1), labels = c("responders", "non-responders"))      
df <- data.frame(gender, outcome)

library(ggplot2)
ggplot(data = df, aes(x = gender, fill = outcome)) +
  geom_bar(position="fill") +
  scale_y_continuous(labels = function(x) paste0(x*100, "%"))

Created on 2021-08-19 by the reprex package (v2.0.0)

Paul
  • 2,850
  • 1
  • 12
  • 37
  • That's a great answer thank you! Works great. Tried upvoting it but says I need atleast 15 reputation. Would it also be possible to show the percentages as labels? – Pashtun Aug 19 '21 at 14:21
0

Managed to find an alternative working answer to the ones posted by Paul and Stéphane (which were both great as well). The advantage of this method is that it is general and can save time when creating many plots.

library(dplyr)
library(ggplot2)

gender <- c("male", "female", "male", "male", "female", "female", "male", "female", "female", "male")
outcome <- factor(c(0,0,0,1,1,1,0,1,1,1), levels = c(0,1), labels = c("responders", "non-responders"))      
df <- data.frame(gender, outcome)

df %>%
  group_by(gender, outcome) %>% 
  summarise(count = n()) %>% 
  mutate(pct = round(count/sum(count), 2)) %>%
ggplot(aes(x = factor(gender), y = pct, fill = factor(outcome))) +
  geom_bar(stat="identity", width = 0.7) + scale_y_continuous(labels = scales::percent_format()) +
  labs(x = "Sex", y = "Percentage", fill = "Outcome") +
  theme_minimal(base_size = 14) +
  geom_text(aes(label=paste0(pct*100, "%")), vjust=-0.25, position=position_stack(0.5))

This is the output enter image description here

Pashtun
  • 123
  • 7