I have a dataset that reads like a log file showing each user interaction with a website. I'm trying to visualize this data to show the most common sequences/pathways through the site (no, I do not have access to Google Analytics - just a data dump.) I've been able to distill the data down to a format that contains the page and the number of times it is the first, second, third page visited, etc.
I thought I might create an alluvial plot (using ggaluvial) stratified by the sequential position. I've roughed together a version of what I'm going for:
Here is a way to generate some sample data that is structured like mine:
pages <- rep(c("Home", "About", "People", "Contact", "Products"), each=6)
positions <- sample(c(1,2,3,4,5))
counts <- sample(1:100, 30)
df_colnames <- c("Page", "Position", "Count")
df <- data.frame(pages, positions, counts)
colnames(df) <- df_colnames
But I cannot seem to get ggaluvial to accept a single column as repeated strata, if that makes sense. Here's what I've got, but it's not much to go on:
library(ggalluvial)
ggplot(df,
aes(axis1 = Page,
axis2 = Position,
y = Count)) +
geom_alluvium() +
geom_stratum() +
geom_text(stat = "stratum",
label.strata = TRUE) +
theme_minimal()
This is just something that I have been trying. If you know of a better way to visualize this information, I'm all ears.
Thank you in advance.