0

I would like to plot the evolution of the number of workers per category ("A", "D", "F", "I"), from 2017 to 2021, with a stacked bar chart (with the labels in the middle of each bar, for each category), one bar per year. Yet my dataset isn't in the right way to do this, I think I need to use pivot_wider() or pivot_longer() from what I have seen here, but I don't really know how to manipulate these functions. Could anyone help ?

Here is the structure of my dataset, for reproducibility :

 structure(list(A = c("10", "7", "8", "8", "9", "Total"), D = c(23, 
 14, 29, 35, 16, 117), F = c(8, 7, 11, 6, 6, 38), I = c(449, 498, 
 415, 470, 531, 2363), annee = c("2017", "2018", "2019", "2020", 
 "2021", NA)), core = structure(list(A = c("10", "7", "8", "8", 
 "9"), D = c(23, 14, 29, 35, 16), F = c(8, 7, 11, 6, 6), I = c(449, 
 498, 415, 470, 531)), class = "data.frame", row.names = c(NA, 
 -5L)), tabyl_type = "two_way", totals = "row", row.names = c(NA, 
 6L), class = c("tabyl", "data.frame"))
  • The column 'A' is not numeric. How did you create this data – akrun Aug 08 '22 at 15:10
  • "manually"... it means there were 10 workers in category A in 2017 – gerardlambert Aug 08 '22 at 15:11
  • I would imagine the last row of A to be sum of the A and annee last row as 'Total' – akrun Aug 08 '22 at 15:11
  • You may try `library(dplyr);library(tidyr);library(ggplot2);df1 %>% mutate(annee = coalesce(annee, A), A = as.numeric(replace(A, A == 'Total', sum(as.numeric(A[-n()]))))) %>% pivot_longer(cols = -annee) %>% ggplot(aes(x =annee, y = value, fill = name)) + geom_col()` – akrun Aug 08 '22 at 15:13

2 Answers2

2
library(tidyverse)
library(ggrepel)

df <- structure(list(A = c("10", "7", "8", "8", "9", "Total"), D = c(
  23,
  14, 29, 35, 16, 117
), F = c(8, 7, 11, 6, 6, 38), I = c(
  449, 498,
  415, 470, 531, 2363
), annee = c(
  "2017", "2018", "2019", "2020",
  "2021", NA
)), core = structure(list(A = c(
  "10", "7", "8", "8",
  "9"
), D = c(23, 14, 29, 35, 16), F = c(8, 7, 11, 6, 6), I = c(
  449,
  498, 415, 470, 531
)), class = "data.frame", row.names = c(
  NA,
  -5L
)), tabyl_type = "two_way", totals = "row", row.names = c(
  NA,
  6L
), class = c("tabyl", "data.frame"))   

df |> 
  filter(!is.na(annee)) |> 
  mutate(A = as.double(A)) |> 
  pivot_longer(-annee, names_to = "category") |> 
  ggplot(aes(annee, value, fill = category, label = value)) +
  geom_col() +
  geom_label_repel(position = position_stack(), max.overlaps = 20)

Created on 2022-08-08 by the reprex package (v2.0.1)

Carl
  • 4,232
  • 2
  • 12
  • 24
  • I get the following message when i run this code with my dataset (and not just the head of the dataset that I posted here for reproducibility) : "Error: (converted from warning) ggrepel: 20 unlabeled data points (too many overlaps). Consider increasing max.overlaps". How can I deal with it ? – gerardlambert Aug 08 '22 at 15:28
  • I've added `max.overlaps` to the above. It's 10 by default and you may need to adjust further. (You could alternatively use `geom_label` but the labels will overlap vertically.) – Carl Aug 08 '22 at 15:39
  • A further option, if you feel you have too many labels, is to only label, for example, if the value is greater than a threshold. – Carl Aug 08 '22 at 15:47
1

Once you remove the total row, and ensuring that A through I are numeric, you can pivot_longer and pass to ggplot() like this:

data %>% 
  filter(A!="Total") %>% 
  mutate(across(A:I, as.numeric)) %>% 
  pivot_longer(cols = -annee, names_to = "group", values_to = "ct") %>% 
  ggplot(aes(annee,ct,fill=group)) + 
  geom_col()

I did not add the category labels, since group I dominates each year; you might want to reconsider that visualization

langtang
  • 22,248
  • 1
  • 12
  • 27