2

I am trying to visualize my data via a sankey diagram.

I have the following dataframe:

sankey1 <- structure(list(pat_id = c(10037, 10264, 10302, 10302, 10302, 
10344, 10482, 10482, 10482, 10613, 10613, 10613, 10628, 10851, 
11052, 11203, 11214, 11214, 11566, 11684, 11821, 11945, 11945, 
11952, 11952, 12122, 12183, 12774, 13391, 13573, 13643, 14298, 
14556, 14556, 14648, 14862, 14935, 14935, 14999, 15514, 15811, 
16045, 16045, 16190, 16190, 16190, 16220, 16220, 16220, 16220
), contactnummer = c(1, 1, 1, 2, 3, 1, 1, 2, 3, 1, 2, 3, 1, 1, 
1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 
1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 3, 1, 2, 3, 99), Combo2 = c(1, 
1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 
2, 4, 4, 1, 5, 1, 1, 1, 1, 3, 3, 1, 5, 1, 1, 3, 1, 1, 1, 1, 1, 
3, 6, 3, 1, 1, 1, 1), treatment = c(99, 0, 0, 1, 1, 0, 99, 99, 
99, 99, 99, 1, 1, 0, 1, 99, 99, 99, 0, 99, 99, 0, 0, 0, 1, 99, 
99, 0, 0, 0, 0, 0, 1, 1, 1, 99, 99, 1, 0, 0, 1, 0, 0, 0, 1, 1, 
99, 99, 99, 99)), row.names = c(NA, 50L), class = c("data.table", 
"data.frame"))

An ID number ("pat_id") can have multiple rows, each row is a contact ("contactnummer") My aim is to visualize which combinations ("combo2") lead to which treatments ("treatment) and at what contact.

I hope to visualise this via a sankey diagram (https://r-graph-gallery.com/321-introduction-to-interactive-sankey-diagram-2.html).

Ideally the desired output would look similar like this: enter image description here

Ideally i would like to have the combinations ("Combo2") as arrows, showed in different colours per unique combination. These arrows should then lead to a treatment. But then i would like them continue, so after contact 1 - if an ID number has a second contact, the arrow shows again what combinations after that treatment occurs and to what treatment it leads in the second contact.

I've tried using the following script, but without succes.

library(networkD3)
library(d3Network)

 # Create a data frame for the Sankey diagram
      sankey_data <- sankey %>%
        group_by(pat_id, Combo2, treatment, contactnummer) %>%
        summarise(Count = n()) %>%
        mutate(Target = lead(treatment), Value = Count) %>%
        filter(!is.na(Target))
      
      # Create a list of unique nodes with color attributes
      combo2_nodes <- unique(sankey_data$Combo2)
      treatment_nodes <- unique(sankey_data$treatment)
      nodes <- data.frame(
        name = c(combo2_nodes, treatment_nodes),
        color = c(rep("Combo2", length(combo2_nodes)), rep("Treatment", length(treatment_nodes)))
      )
      
      # Create a list of links
      links <- data.frame(
        source = match(sankey_data$Combo2, nodes$name) - 1,
        target = match(sankey_data$Target, nodes$name) - 1,
        value = sankey_data$Value
      )
      
      # Create the Sankey diagram with color attributes
      sankey_plot <- sankeyNetwork(
        Links = links,
        Nodes = nodes,
        Source = "source",
        Target = "target",
        Value = "value",
        NodeID = "name",
        units = "Count",
        NodeGroup = "color"  # Specify the color attribute
      )
      
      # Display the plot
      sankey_plot

But this does not create it the I would like. I am very unexperienced with the sankey diagram. Any tips?

RvS
  • 149
  • 8

1 Answers1

1

sorry but to me it is not too much clear how would you manage groups etc., but we can start with something like this:

EDIT AFTER COMMENT

# load necessary libraries
library(networkD3)
library(d3Network)
library(dplyr)
library(tidyr)


# messing up with data: the goal is to create data.frame
# with source and targets to feed the sankey
df <-
sankey1 %>%  
  # wide format to gives an order
  pivot_wider(id_cols = pat_id
               , names_from = contactnummer
               , values_from = c(Combo2,treatment)
               ,names_glue = "{contactnummer}_{.value}"
               ,names_sort=TRUE) %>% 
  # put in a long format
  pivot_longer(!pat_id, names_to = 'variable', values_to = 'value') %>%
  # remove nas
  filter(!is.na(value)) %>%
  # grouping and creating the source field by pat_id
  group_by(pat_id) %>% 
  mutate(source = paste(substr(variable,1,15),value, sep = '_')) %>% 
  # useful columns
  select(pat_id, source) %>% 
  # arrange 
  arrange(pat_id, source) %>% 
  # adding by group the target column
  mutate(target = c(source[2:length(source)],NA)) 

# define source and target
links <- data.frame(source =df$source,
                    target   =df$target) %>% 
  filter(!is.na(target))

# getting unique nodes
nodes <- data.frame(name = as.character(unique(c(links$source, links$target)))) 


# now convert as character
links$source <- as.character(links$source)
links$target<- as.character(links$target)

# matching links and node, then indexing to 0
links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1

# group by (we are grouping by number of rows)
links <- links %>% group_by(source, target) %>% tally()
   
# plot it!
sankeyNetwork(Links = links
              , Nodes = nodes
              , Source = 'source'
              , Target = 'target'
              , Value = 'n'
              , NodeID = 'name'
              ,fontSize = 15)

enter image description here

Hope it helps!

s__
  • 9,270
  • 3
  • 27
  • 45
  • Thank you for your answer! This is a great first step, which i could not accomplish by myself! Ideally i would like to have the combinations ("Combo2") as arrows, showed in different colours per unique combination. These arrows should then lead to a treatment (as your example does!). But then i would like them continue, so after contact 1 - if an ID number has a second contact, the arrow shows again what combinations after that treatment occurs and to what treatment it leads in the second contact. It's difficult to explain it via text, but does this make it a little more clearer? – RvS Aug 23 '23 at 14:41
  • 1
    @RvS thanks see the edit, let's see if it fits better to your request. – s__ Aug 24 '23 at 12:11
  • Thank you very much! This fits much better! It is almost exactly how i aim it to be! However, i see that after the treatment of contact 1 (the second node with blue,red,grey colours) the next node contains the combo2 variations (the purple, yellow, orange and grey box). My aim is that the after the treatment in the first contact, connections of second contact (combo2's) directly go the treatment in the second contact. Ideally i would like to have a specific colour for each connection (for each 8 different combo2's) and for each treatment. – RvS Aug 27 '23 at 10:10
  • Is my explanation making any sense? Otherwise i'll try to edit the question again, – RvS Aug 30 '23 at 16:45
  • Sorry to me it is not too much clear, I've understood that after each combo there is a treatment alternate, and the order is giveth from contactnumber. Could you explain the real gists behind? – s__ Sep 01 '23 at 10:16