1

I am currently struggling with a sankey diagram visualising the flow of anti-cancer treatment for women with advanced breast cancer.

I have a columnn for each line of treatment (beh1, beh2 etc) naming the given treatment (6 options or "Other").

However, some of the patients do not receive all 6 lines of treatment that my df currently contains, and have thus been give a N/A in several columns.

An example

sankey <- data.frame(ID = c("1","2","3","4","5"),
                 Beh1 = c("TDM1","Capecitabine", "Capecitabine", "Eribulin", "TDM1"),
                 Beh2 = c("Capecitabine", "NA", "Taxane", "Eksperimentiel", "Taxane"),
                 Beh3 = c("Eribulin", "NA", "Eribulin", "Eribulin", "Eribulin"))

And the diagram enter image description here

    SankeyDiagram(sankey[-1],
          link.color = "Source",
          variables.share.values = TRUE,)

What I wished it showed:

enter image description here

Any help would be highly appreciated

Kind regards

  • Just a guess, try to recode string `"NA"` into a real `NA`: `sankey[ sankey == "NA" ] <- NA` – zx8754 Jun 15 '21 at 10:16
  • @zx8754 I'm using library(flipPlots) for the SankeyDiagram And in my 'true' dataset they are 'real' NAs - I just tried your solution, but it didn't work – Tobias Berg Jun 15 '21 at 10:20

2 Answers2

0

Is this what you want?

library(tidyverse)
# require(devtools)
# install_github("Displayr/flipPlots")

sankey <- data.frame(ID = c("1","2","3","4","5"),
                     Beh1 = c("TDM1","Capecitabine", "Capecitabine", "Eribulin", "TDM1"),
                     Beh2 = c("Capecitabine", "NA", "Taxane", "Eksperimentiel", "Taxane"),
                     Beh3 = c("Eribulin", "NA", "Eribulin", "Eribulin", "Eribulin"))

sankey %>%
  dplyr::na_if("NA") %>% 
  tidyr::drop_na() %>%
  dplyr::select(-1) %>% 
  SankeyDiagram(link.color = "Source",
              variables.share.values = TRUE)

enter image description here

Claudiu Papasteri
  • 2,469
  • 1
  • 17
  • 30
  • Thanks for the fast reply. Unfortunately this results in the diagram only showing those will complete data for all treatments. It ought to still represent two patients receiving 'capecitabine' in the first row of nodes. – Tobias Berg Jun 15 '21 at 10:29
  • Yes, I figgered but I did not find a way with SankeyDiagram function and then I started to wonder what the point of that unconnected node really was. – Claudiu Papasteri Jun 15 '21 at 14:11
0

First, I would convert your data.frame so that it contains factors instead of character columns and actual NA values instead of character "NA"s.

sankey <- data.frame(ID = c("1","2","3","4","5"),
                     Beh1 = c("TDM1","Capecitabine", "Capecitabine", "Eribulin", "TDM1"),
                     Beh2 = c("Capecitabine", NA, "Taxane", "Eksperimentiel", "Taxane"),
                     Beh3 = c("Eribulin", NA, "Eribulin", "Eribulin", "Eribulin"), 
                     stringsAsFactors = TRUE)

If you call SankeyDiagram with output.data.only = TRUE you get the nodes and links that the function uses to produce the diagram. You can then edit the links to exclude the link involving the NAs and call SankeyDiagram with the modified links. Adding sinks.right = FALSE to the SankeyDiagram call ensures that the second link for "Beh1: Capecitabine" doesn't extend to the right edge of the plot.

links.and.nodes <- SankeyDiagram(sankey[-1], link.color = "Source", output.data.only = TRUE, 
                                 variables.share.values = TRUE, sinks.right = FALSE)
links.and.nodes$links <- od$links[!is.na(od$links$group), ]
SankeyDiagram(links.and.nodes = links.and.nodes, link.color = "Source", 
              variables.share.values = TRUE, sinks.right = FALSE)

SankeyDiagram output

mmclean
  • 16
  • 2