1

I would like to barplot in ggplot2 a categorical variable grouped according a second categorical variable and use facet_wrap to divide them in different plots. Than I would show percentage of each. Here a reproducible example

test <- data.frame(
  test1 = sample(letters[1:2], 100, replace = TRUE), 
  test2 = sample(letters[3:5], 100, replace = TRUE),
  test3 = sample(letters[9:11],100, replace = TRUE )
)


ggplot(test, aes(x=factor(test1))) +
  geom_bar(aes(fill=factor(test2), y=..prop.., group=factor(test2)), position="dodge") +
  facet_wrap(~factor(test3))+
  scale_y_continuous("Percentage (%)", limits = c(0, 1), breaks = seq(0, 1, by=0.1), labels = percent)+
  scale_x_discrete("")+
  theme(plot.title = element_text(hjust = 0.5), panel.grid.major.x = element_blank())

This give me a barplot with the percentage of test2 according test1 in each test3. I would like to show the percentage of each bar on the top. Moreover, I would like to change the name of the legend in the right from factor(test2) in Test2.

enter image description here

clemens
  • 16,716
  • 11
  • 50
  • 65
ChinaskyM
  • 13
  • 1
  • 6

1 Answers1

4

It may be easiest to do the data summary yourself so that you can create a column with the percentage labels you want. (Note that as is, I'm not sure what you want your percentages to show- in facet i, group b, there is a column that is nearly 90%, and two columns that are greater than or equal to 50%- is that intended?)

Libraries and your example data frame:

library(ggplot2)
library(dplyr)

test <- data.frame(
  test1 = sample(letters[1:2], 100, replace = TRUE), 
  test2 = sample(letters[3:5], 100, replace = TRUE),
  test3 = sample(letters[9:11],100, replace = TRUE )
)

First, group by all columns (note the order), then summarize to get the length of test2. Mutate to get a value for the column height and label- here I've multiplied by 100 and rounded.

test.grouped <- test %>%
  group_by(test1, test3, test2) %>%
  summarize(t2.len = length(test2)) %>%
  mutate(t2.prop = round(t2.len / sum(t2.len) * 100, 1))

> test.grouped
# A tibble: 18 x 5
# Groups:   test1, test3 [6]
    test1  test3  test2 t2.len t2.prop
   <fctr> <fctr> <fctr>  <int>   <dbl>
 1      a      i      c      4    30.8
 2      a      i      d      5    38.5
 3      a      i      e      4    30.8
 4      a      j      c      3    20.0
 5      a      j      d      8    53.3
...

Use the summarized data to build your plot, using geom_text to use the proportion column as the label:

ggplot(test.grouped, aes(x = test1, 
                         y = t2.prop, 
                         fill = test2, 
                         group = test2)) +  
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_text(aes(label = paste(t2.prop, "%", sep = ""), 
                group = test2), 
            position = position_dodge(width = 0.9),
            vjust = -0.8)+
  facet_wrap(~ test3) + 
  scale_y_continuous("Percentage (%)") +
  scale_x_discrete("") + 
  theme(plot.title = element_text(hjust = 0.5), panel.grid.major.x = element_blank())

enter image description here

Luke C
  • 10,081
  • 1
  • 14
  • 21
  • Thanks for your answer. It fit with my necessity and you solve my problem. How to avoid that the sum of the percentage is different to 100? I understand that is the result of the function round, but I was thinking if there was any solution. Thanks again – ChinaskyM Nov 24 '17 at 21:12
  • @ChinaskyM - Sure- if you don't use `round` in the initial calculation (with `mutate`), then a more precise value is conserved. How many decimal places you want to display in the plot itself will depend on how clean you want it to look, but you can play around with `round` within the `geom_text()` call if you don't round in the initial calculation. – Luke C Nov 24 '17 at 21:23