1

I have a dataset that reads like a log file showing each user interaction with a website. I'm trying to visualize this data to show the most common sequences/pathways through the site (no, I do not have access to Google Analytics - just a data dump.) I've been able to distill the data down to a format that contains the page and the number of times it is the first, second, third page visited, etc.

I thought I might create an alluvial plot (using ggaluvial) stratified by the sequential position. I've roughed together a version of what I'm going for:

Sample alluvial plot

Here is a way to generate some sample data that is structured like mine:

pages <- rep(c("Home", "About", "People", "Contact", "Products"), each=6)
positions <- sample(c(1,2,3,4,5))
counts <- sample(1:100, 30)
df_colnames <- c("Page", "Position", "Count")

df <- data.frame(pages, positions, counts)
colnames(df) <- df_colnames

But I cannot seem to get ggaluvial to accept a single column as repeated strata, if that makes sense. Here's what I've got, but it's not much to go on:

library(ggalluvial)
ggplot(df, 
       aes(axis1 = Page,
           axis2 = Position,
           y = Count)) +
  geom_alluvium() +
  geom_stratum() +
  geom_text(stat = "stratum", 
            label.strata = TRUE) +
  theme_minimal()

This is just something that I have been trying. If you know of a better way to visualize this information, I'm all ears.

Thank you in advance.

Rob E.
  • 194
  • 3
  • 17
  • It looks like your data don't include information about which visitors accessed the pages in each sequence, only the number of times each page was accessed at each position in the sequence. Is that right? If so, then the data may not be suitable for an alluvial plot, which tracks individuals or groups across multiple axes, which in this case would, i think, be positions. – Cory Brunson Jun 26 '21 at 14:08
  • To complement @CoryBrunson's answer. If you have the information of the "visitor" you may convert the alluvial data into the Lodes (long) format. In this format each row correspond to one lode defined by the axis ("Position" columns) and a stratum (one strata in `c("Home", "About", "People", "Contact", "Products")`). – DoRemy95 Nov 18 '21 at 12:49

0 Answers0