-1

I have a data like this:

df <-  data.frame(
  groups = c(rep("A", 5), 
              rep("B", 3), 
              rep("C", 2),
              rep(c("D","E","F","G","H"), 1)),
  subgroups = paste0("Subgroup", 1:15),
  length = c(103,112,141,152,50,
             77,82,88,
             59,86,
            3,17,1,5,24))

I want to plot a stacked bar graph for lengths of the groups. For each group, the height of the bar should show the percentage of that group among all groups calculated using lengths; and the bar should be divided showing the percentage of the subgroups in that group. But the thing is, in the original data the lengths are in millions, so i want to combine some groups to create an "Other" group, and their bar should be sectioned to show the percentage of the classes in that group. so I will have 5 columns for A,B,C,D,Other: The bars of first 4 should reflect the percentage of lengths of subgroups and the "Other" bar should reflect the percentage of lengths of groups that were combined.

I did a bit of data wrangling using dplyr to make new columns showing the length percentages, so in a new column I have the group_length_percent for each group and subgroup_length_percent. However I still could not figure out how to plot because when I try to plot using ggplot, it plots the sum of the percentages so the bars are more than 100% or the bar for the Other group is divided equally to the number of groups combined, does not reflect the lengths of the classes. I feel confused and not sure how to proceed.

Thank you for your responses.

  • 1
    Please show your efforts/code trying to add the new columns, even if you do not have a working code. As it stands, your question can be categorized as too broad or no [mre]. – M-- Aug 21 '23 at 21:03

1 Answers1

0

First you can replace the non A, B, C groups to other using mutate, then calculate the percentage by length / sum(length) for each of the groups. Ensure you use geom_col instead of geom_bar

library(tidyverse)

df %>%
  mutate(groups = if_else(groups %in% c("A", "B", "C"), groups, "Other")) %>%
  mutate(percent = scales::percent(length/sum(length)), .by = "groups") %>%
  ggplot(aes(groups, length, group = subgroups, fill = groups)) +
  geom_col(color = "black") +
  geom_text(aes(label = percent), position = position_stack(vjust = 0.5)) +
  theme_minimal(base_size = 16)

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thank you! It works wonders. I just had to use the column with lengths, not the percentages apparently. One more question, is it possible to sort the x-axis and the subgroups in descending order? – cookiemonster Aug 21 '23 at 23:37