0

I have a dataset which has the following structure < dput(head(df)) > :

 structure(list(type_de_sejour = c("Amb", "Hosp", 
 "Hosp", "Amb", "Hosp", "Sea"), 
 specialite = c("ANES", "ANES", 
 "Autres", "CARD", "CARD", "CARD"
 ), CA_annee_N = c(2712L, 122180L, 0L, 822615L, 6905494L, 
 0L), nb_sejours_N = c(8L, 32L, 0L, 1052L, 2776L, 0L), nb_doc_N = c(5L, 
 8L, 0L, 12L, 15L, 0L), CA_annee_N1 = c(4231L, 78858L, 6587L, 
 327441L, 6413083L, 0L), nb_sejours_N1 = c(13L, 29L, 2L, 532L, 
 2819L, 0L), nb_doc_N1 = c(6L, 9L, 1L, 12L, 12L, 0L
 ), CA_annee_N2 = c(4551L, 27432L, 0L, 208326L, 7465440L, 
 575L), nb_sejours_N2 = c(15L, 8L, 0L, 463L, 3393L, 1L), nb_doc_N2 = c(6L, 
 4L, 0L, 11L, 13L, 1L), site = c("FR", "FR", "FR", "FR", 
 "FR", "FR")), row.names = c(NA, 6L), class = "data.frame")

I am trying to plot a graph showing the percentage each "specialite" (distinguishing per "site", ideally by faceting or doing 2 plots, one per site) represents in the total "nb_sejours_N", after having filtered by type_de_sejour == "Amb".

I have tried the following code :

df %>%
    mutate(volume_N == nb_sejours_N,
           volume_N1 == nb_sejours_N1,
           volume_N2 == nb_sejours_N2)%>%
   filter(type_de_sejour == "Amb")%>%
   group_by(site) %>%
   mutate(proportion_N = volume_N/sum(volume_N, na.rm = TRUE),
          proportion_N1 = volume_N1/sum(volume_N1, na.rm = TRUE),
          proportion_N2 = volume_N2/sum(volume_N2, na.rm = TRUE))

Unfortunately, it doesn't work, so I can't go any further. I would also like to know if anyone knows an efficient code to plot what I'm trying to represent ?

  • 1
    Sorry, what is site in your example? – Dasr Aug 03 '22 at 14:17
  • I get `object 'volume_N' not found`. Where is it defined? Do you mean to be using `==` in your first `mutate` (three times), or should it be assigning with `=`? (Specifically, try replacing your first mutate with `mutate(volume_N = nb_sejours_N, volume_N1 = nb_sejours_N1, volume_N2 = nb_sejours_N2)`). – r2evans Aug 03 '22 at 14:18
  • @r2evans, yes, this is it, thank you ! i'm keeping the question open in case someone can help me regarding the plot –  Aug 03 '22 at 14:32
  • @Dasr "site" means "geographical location", and there are two geographical locations in my dataset –  Aug 03 '22 at 14:32
  • 2
    So, the first thing you may want to do is pivot longer, but I'm not quite getting what the structure of the plot should be. You should be able to modify the following: df %>% filter(type_de_sejour == "Amb") %>% pivot_longer(cols = c("nb_sejours_N","nb_sejours_N1","nb_sejours_N2"), values_to = "visit") %>% ggplot(aes(fill=name, y=visit, x=name)) + geom_bar(position="stack", stat="identity") – Dasr Aug 03 '22 at 14:40
  • thanks, it works @Dasr ! you can post it as an answer if you want. I just have another question : would you know how to place the values of the bars (labels of the value of the variable "nb_de_sejours" takes for each year N, N1, N2) just above each bar ? –  Aug 04 '22 at 09:12
  • You may just want to adjust the xy position of the labels. – Dasr Aug 04 '22 at 09:22

1 Answers1

0

I believe the following works:


# creating plot
p = df %>% filter(type_de_sejour == "Amb") %>% 
  pivot_longer(cols = c("nb_sejours_N","nb_sejours_N1","nb_sejours_N2"), values_to = "visit") %>% 
  ggplot(aes(fill=name, y=visit, x=name)) + geom_bar(position="stack", stat="identity")


# creating summary of totals for each column
totals = df %>% filter(type_de_sejour == "Amb") %>% 
  pivot_longer(cols = c("nb_sejours_N","nb_sejours_N1","nb_sejours_N2"), values_to = "visit") %>% 
  group_by(name) %>% summarise(total = sum(visit))


# adding totals on top of bars to plot
p + geom_text(aes(name, total, label = total, fill = NULL), data = totals)

Dasr
  • 777
  • 6
  • 16
  • Great, it works ! the only problem remaining is that the bars are not all of the same size so the labels are sometime way above the bar, but your code is already helping a lot, so thanks ! –  Aug 04 '22 at 09:35