1

I'm creating a Sankey chart in R with networkD3::sankeyNetwork(). I want to show percentage besides the node label, but can't get this to work using an approach on a similar post and to show clean names. Originally I had the code for the node name to appear clean (image 1), but the Sankey didn't run correctly when I added the code to add on the percentages. I've managed to generate the percentage, but they are incorrect (I would like each node to equal 100%).

(R netWorkD3 Sankey - add percentage by js doesn't work) (How to plot Sankey Graph with R networkD3 values and percentage below each node)

Desired output (% next to/below count): enter image description here

Current output with code attempt to add %: enter image description here

Here is my attempt:

DF:

df <-
  structure(
    list(
      flow_1 = c(
        "myocardial infarction",
        "myocardial infarction",
        "myocardial infarction",
        "myocardial infarction",
        "myocardial infarction",
        "myocardial infarction",
        "angina pectoris",
        "angina pectoris",
        "angina pectoris",
        "angina pectoris",
        "angina pectoris",
        "angina pectoris",
        "ischaemia",
        "ischaemia",
        "ischaemia",
        "ischaemia",
        "ischaemia",
        "ischaemia"
      ),
      flow_2 = c(
        "death",
        "myocardial infarction",
        "heart failure",
        "cardiac rhythm disorders",
        "angina pectoris",
        "ischaemia",
        "death",
        "myocardial infarction",
        "heart failure",
        "cardiac rhythm disorders",
        "angina pectoris",
        "ischaemia",
        "death",
        "myocardial infarction",
        "heart failure",
        "cardiac rhythm disorders",
        "angina pectoris",
        "ischaemia"
      ),
      flow_3 = c(
        NA,
        "death",
        "death",
        "heart failure",
        "cardiac rhythm disorders",
        "angina pectoris",
        NA,
        "death",
        "death",
        "death",
        "cardiac rhythm disorders",
        "death",
        NA,
        "death",
        "death",
        "heart failure",
        "cardiac rhythm disorders",
        "angina pectoris"
      )
    ),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA,-18L)
  )

#attempt

library(dplyr)
library(tidyverse)
library(networkD3)
library(RColorBrewer)
plt <- sankeyNetwork(Links = links2, Nodes = nodes,
                     Source = "IDsource", 
                     Target = "IDtarget",
                     Value = "value", 
                     NodeID = "name",
                     nodeWidth = 8,
                     nodePadding = 15,
                     fontFamily = "Arial"
)
df3 <- df %>%
  group_by(Source) %>%
  mutate(sPerc = paste0(round(sum(Value) / sum(df3$Value) * 100, 2), "%")) %>% 
  group_by(Destination) %>% 
  mutate(dPerc = paste0(round(sum(Value) / sum(df3$Value) * 100, 2), "%")) %>% 
  pivot_longer(c(Destination, Source)) %>% 
  mutate(Perc = ifelse(name == "Destination",
                       dPerc, sPerc)) %>%  
  select(Value, value, Perc) %>%           
  group_by(value, Perc) %>% 
  summarise(Value = sum(Value)) 
plt$x$nodes <- right_join(plt$x$nodes, df3, by = c("name" = "value"))
htmlwidgets::onRender(plt, '
                      function(el, x) {
                        d3.select(el).selectAll(".node text")
                          .text(d => d.name + " (" + d.value + ")" + " (" + d.Perc + ")")
                      }')
Rokit87
  • 23
  • 3
  • What do you want the percent value to be a percent of? – CJ Yetman Nov 01 '22 at 19:54
  • @CJYetman I would like the percent value of the the total in each vertical set of nodes (hope this makes sense, not sure how else to describe them) e.g. the first 'sequence' of the sankey has 18 total encounters, so I would like 33.3% to be display next to all three counts of each condition. I would also like to show this for the remaining two sequences. Thanks! – Rokit87 Nov 02 '22 at 04:06

1 Answers1

0
df %>% 
  pivot_longer(cols = everything()) %>% 
  filter(!is.na(value)) %>% 
  group_by(name) %>% 
  mutate(total = n()) %>% 
  group_by(name, value) %>% 
  summarise(sub = paste0(round(n() / total * 100), "%"), .groups = "drop") %>% 
  unique()
#> # A tibble: 13 × 3
#>    name   value                    sub  
#>    <chr>  <chr>                    <chr>
#>  1 flow_1 angina pectoris          33%  
#>  2 flow_1 ischaemia                33%  
#>  3 flow_1 myocardial infarction    33%  
#>  4 flow_2 angina pectoris          17%  
#>  5 flow_2 cardiac rhythm disorders 17%  
#>  6 flow_2 death                    17%  
#>  7 flow_2 heart failure            17%  
#>  8 flow_2 ischaemia                17%  
#>  9 flow_2 myocardial infarction    17%  
#> 10 flow_3 angina pectoris          13%  
#> 11 flow_3 cardiac rhythm disorders 20%  
#> 12 flow_3 death                    53%  
#> 13 flow_3 heart failure            13%
CJ Yetman
  • 8,373
  • 2
  • 24
  • 56