0

I have a follow-up question to this one geom_flow question

When i duplicate the rows ggalluvial is not able to generate the flow charts mentioned in the discussion:

set.seed(42)
individual <- as.character(rep(1:20,each=5))
timeperiod <- paste0(rep(c(0, 18,36,54,72),20),"_week")
therapy <- factor(sample(c("Etanercept", "Infliximab", "Rituximab",  "Adalimumab","Missing"), 100, replace=T))

d <- data.frame(individual, timeperiod, therapy)
d <- rbind(d,d)

ggplot(d, aes(x = timeperiod, stratum = therapy, alluvium = individual, fill = therapy, label = therapy)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  geom_flow(stat = "alluvium", lode.guidance = "rightleft", color = "darkgray") +
  geom_stratum()

However, i get the following error message:

Error in geom_flow(): ! Problem while computing stat. ℹ Error occurred in the 1st layer. Caused by error in setup_data(): ! Data is not in a recognized alluvial form (see help('alluvial-data') for details). Run rlang::last_error() to see where the error occurred.

I tried to transform the dataset to lodes format with:

d <- d %>%
  
  to_lodes_form(key= "x",
                value = "stratum",
                id = "alluvium",
                axes = 3
                )

The test for lodes form turns true:

is_lodes_form(d,
              key = "x", value = "stratum", id = "alluvium")

When i plug in stratum und alluvium into the plot i get only the bars for the strata but the lodes are missing.

ggplot(d, aes(x = timeperiod, stratum = stratum, alluvium = alluvium, fill = stratum, label = stratum)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  geom_flow(stat = "alluvium", lode.guidance = "rightleft", color = "darkgray") +
  geom_stratum() 

What should i change to display the lodes in the plot?

Philipp Schulz
  • 131
  • 1
  • 8
  • Have you updated the `stratum` column for the duplicated rows? – Talha Asif Feb 16 '23 at 10:47
  • I dont know what exactly you mean with "updating the stratum column". Can you please explain it in other words? – Philipp Schulz Feb 16 '23 at 11:26
  • Meanwhile i found the problem but no proper solution. When it is just the duplication issue it helps to compute a count by indivual, timeperiod and therapy and add the count variable as y variable. – Philipp Schulz Feb 22 '23 at 08:38
  • Another problem remains unsolved: what to do when the categories are not exclusive i.e. when an indivual in has observations for 2 or more different therapies within the same time period? – Philipp Schulz Feb 22 '23 at 08:39

1 Answers1

0

As i mentioned in the comments one have to use a count as y variable:

set.seed(42)
individual <- as.character(rep(1:20,each=5))
timeperiod <- paste0(rep(c(0, 18,36,54,72),20),"_week")
therapy <- factor(sample(c("Etanercept", "Infliximab", "Rituximab",  "Adalimumab","Missing"), 100, replace=T))
d <- data.frame(individual, timeperiod, therapy)
head(d)

d <- rbind(d,d)


d %>%
  count(individual, timeperiod, therapy) %>% 
  ggplot(aes(x = timeperiod,
             y = n,
             stratum = therapy,
             alluvium = individual,
             fill = therapy,
             label = therapy)) +
  geom_flow() +
  geom_stratum()

I will open a new question since this one does not cover my orginal problem.

Philipp Schulz
  • 131
  • 1
  • 8