2

I have a dataset with multiple columns. I am visually summarizing several columns using simple bar plots. A simple example:

set.seed(123)

df <- 
  data.frame(
    a = sample(1:2, 20, replace = T),
    b = sample(0:1, 20, replace = T)
  )


ggplot(gather(df,, factor_key = TRUE), aes(x = factor(value))) + 
geom_bar() + 
facet_wrap(~ key, scales = "free_x", as.table = TRUE) + 
xlab("")

Now, I want to add percentages above each of the 4 columns, saying what percent of rows in the dataframe each column represents. I.e., here, the following numbers would right above the four columns, from left to right in this order: 55%, 45%, 60%, 40%.

How can I automate this---given that I have a large number of columns I have to do this for? (Note I want to keep the raw count of responses on the Y axis and just have percentages appear in the plots.)

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
lethalSinger
  • 606
  • 3
  • 9
  • 3
    Does this answer your question? [How to add percentage or count labels above percentage bar plot?](https://stackoverflow.com/questions/29869862/how-to-add-percentage-or-count-labels-above-percentage-bar-plot) – UseR10085 Apr 21 '20 at 05:25
  • I saw that but am having trouble applying it to the case of multiple columns gathered. – lethalSinger Apr 21 '20 at 05:31

1 Answers1

4

In addition to the answer proposed by @BappaDas, in your particular case you want to preserve the count and add percentage whereas the proposed answer has percentages both on y axis and text labeling.

Here, a modified solution is to compute the count for each variable and calculate the percentage. A possible way of doing it is to use tidyr (for reshaping the data in a "long" form) and dplyr package:

library(tidyr)
library(dplyr)

df %>% pivot_longer(everything(), names_to = "var", values_to = "val") %>%
  group_by(var) %>% count(val) %>%
  mutate(Label = n/sum(n))

# A tibble: 4 x 4
# Groups:   var [2]
  var     val     n Label
  <chr> <int> <int> <dbl>
1 a         1    11  0.55
2 a         2     9  0.45
3 b         0    12  0.6 
4 b         1     8  0.4 

Now at the end of this pipe sequence, you can add ggplot plotting code in order to obtain the desired output by passing the count as y argument and the percentage as label argument:

library(tidyr)
library(dplyr)
library(ggplot2)

df %>% pivot_longer(everything(), names_to = "var", values_to = "val") %>%
  group_by(var) %>% count(val) %>%
  mutate(Label = n/sum(n))  %>%
  ggplot(aes(x = factor(val), y = n))+
  geom_col()+
  facet_wrap(~var, scales = "free", as.table = TRUE)+
  xlab("")+
  geom_text(aes(label = scales::percent(Label)), vjust = -0.5)

enter image description here

dc37
  • 15,840
  • 4
  • 15
  • 32