0

I have a dataframe with three categorical variables:

  • classification1 : "1", "2", "3", "4"
  • classification2 : "Low", "Med", "High"
  • survival: "1" (alive), "2" (dead)

The dataframe is composed of multiples rows, one for each individual. The 20 first rows:

classification1 <- c(4, 3, 1, 2, 3, 1, 2, 2, 2, 2, 1, 1, 4, 2, 2, 1, 2, 1, 3, 2)
classification2 <- c("Medium", "Medium", "Low", "High", "High", "Low", "Medium", "Medium", "High", "Low", "Low", "Low", "High", "High", "Medium", "Low", "Medium", "Low", "Medium", "Medium")
survival <- c(2, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 2, 2, 1, 2, 1)
df <- data.frame(classification1, classification2, survival)

I want to use ggalluvialwith ggplot2 to build an alluvial plot like this, but I can't figure how!

alluvial plot

The following code return an error ( Error in FUN(X[[i]], ...) : objet 'Freq' introuvable ) because I can't figure out what is "Freq":

ggplot(data = df, aes(axis1 = classification1, axis2 = classification2, y = Freq)) +
  scale_x_discrete(limits = c("classification1", "classification2"), expand = c(.2, .05)) +
  geom_alluvium(aes(fill = survival)) +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum)))

Any clues on how to format my dataframe to fit in ggaluvial?

B_slash_
  • 309
  • 2
  • 17

1 Answers1

3

You could aggregate your dataset using e.g. dplyr::count to add a column with the frequencies:

library(ggalluvial)
library(dplyr)

df <- df %>% 
  count(classification1, classification2, survival, name = "Freq")

ggplot(data = df, aes(axis1 = classification1, axis2 = classification2, y = Freq)) +
  scale_x_discrete(limits = c("classification1", "classification2"), expand = c(.2, .05)) +
  geom_alluvium(aes(fill = factor(survival))) +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum)))

stefan
  • 90,330
  • 6
  • 25
  • 51