0

So I'm currently trying to create a sankey network tracking a patient's treatment journey. The problem is some patients would stop with the first treatment and not go to the second.

I'm currently using networkD3 and I'm forced to remove all the links and nodes that do not have a target resulting in lesser count.

Here's the sample data and code that I'm using right now.

    df <- read.csv( header = TRUE, as.is = TRUE, 
                text = 'tx1,tx2,tx3,tx4
                D1,D2,D3,D4
                D2,D1,,
                D1,,,
                D3,,,')

View(df)

links <-
  df %>%
  mutate(row = row_number()) %>%
  gather('column', 'source', -row) %>%
  mutate(group = source) %>%
  mutate(column = match(column, names(df))) %>%
  group_by(row) %>%
  arrange(column) %>%
  mutate(target = lead(source)) %>%
  ungroup() %>%
  filter(!is.na(target)) %>%
  filter((target)!="")

View(links)

links <-
  links %>%
  mutate(source = paste0(source, '_', column)) %>%
  mutate(target = paste0(target, '_', column + 1)) %>%
  select(source, target, group)

View(links)

nodes <- data.frame(name = unique(c(links$source, links$target)))

View(nodes)

links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1
links$value <- 1

drop(grp)

grp <- data.frame(source=links$source)

View(grp)

grp_2<-
  grp %>% count(source)

View(grp_2)

colnames(grp_2)[which(names(grp_2) == "n")] <- "n_source"
colnames(grp_2)[which(names(grp_2) == "source")] <- "source"

View(grp_2)

links <- merge(x = links, y = grp_2, by = "source", all = TRUE)

links$value <- links$value/links$n_source

View(links)

nodes$name <- sub('_[0-9]+$', '', nodes$name)
nodes$group <- nodes$name

View(nodes)
View(links)

sankeyNetwork(sinksRight = FALSE,Links = links, Nodes = nodes, 
              Source = 'source', Target = 'target', Value = 'value', 
              NodeID = 'name',nodePadding = 1,
              LinkGroup = "group", NodeGroup = "group",fontSize = 8.7)

My current output

As you can see, the drug taken in TX1- D3 is not shown as it does not have a target, but it should be because a patient took it. Is there any way to do it?

s__
  • 9,270
  • 3
  • 27
  • 45
sammy
  • 11
  • 2
  • 2
    No, not without faking it. These Sankey charts are fundamentally about flows/links, so a node only exists as the start and/or end of a flow/link. The only way to achieve this is to add some kind of termination node so that the D3 node can be at the start of a fake link that leads to the fake termination node. – CJ Yetman Aug 05 '21 at 11:18
  • Is there any other plot that could represent this as the way I need? – sammy Aug 05 '21 at 11:21
  • The force network plots in the networkD3 package can show nodes that are detached from the network, but generally speaking I would look for a plot that is non-network oriented, because the concept you have here doesn’t seem to look much like network data. – CJ Yetman Aug 05 '21 at 18:49

0 Answers0