So I'm currently trying to create a sankey network tracking a patient's treatment journey. The problem is some patients would stop with the first treatment and not go to the second.
I'm currently using networkD3 and I'm forced to remove all the links and nodes that do not have a target resulting in lesser count.
Here's the sample data and code that I'm using right now.
df <- read.csv( header = TRUE, as.is = TRUE,
text = 'tx1,tx2,tx3,tx4
D1,D2,D3,D4
D2,D1,,
D1,,,
D3,,,')
View(df)
links <-
df %>%
mutate(row = row_number()) %>%
gather('column', 'source', -row) %>%
mutate(group = source) %>%
mutate(column = match(column, names(df))) %>%
group_by(row) %>%
arrange(column) %>%
mutate(target = lead(source)) %>%
ungroup() %>%
filter(!is.na(target)) %>%
filter((target)!="")
View(links)
links <-
links %>%
mutate(source = paste0(source, '_', column)) %>%
mutate(target = paste0(target, '_', column + 1)) %>%
select(source, target, group)
View(links)
nodes <- data.frame(name = unique(c(links$source, links$target)))
View(nodes)
links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1
links$value <- 1
drop(grp)
grp <- data.frame(source=links$source)
View(grp)
grp_2<-
grp %>% count(source)
View(grp_2)
colnames(grp_2)[which(names(grp_2) == "n")] <- "n_source"
colnames(grp_2)[which(names(grp_2) == "source")] <- "source"
View(grp_2)
links <- merge(x = links, y = grp_2, by = "source", all = TRUE)
links$value <- links$value/links$n_source
View(links)
nodes$name <- sub('_[0-9]+$', '', nodes$name)
nodes$group <- nodes$name
View(nodes)
View(links)
sankeyNetwork(sinksRight = FALSE,Links = links, Nodes = nodes,
Source = 'source', Target = 'target', Value = 'value',
NodeID = 'name',nodePadding = 1,
LinkGroup = "group", NodeGroup = "group",fontSize = 8.7)
As you can see, the drug taken in TX1- D3 is not shown as it does not have a target, but it should be because a patient took it. Is there any way to do it?