I'd like to do sequence analysis in R, and I'm trying to convert my data into a usable form for the arulesSequences package.
library(tidyverse)
library(arules)
library(arulesSequences)
df <- data_frame(personID = c(1, 1, 2, 2, 2),
eventID = c(100, 101, 102, 103, 104),
site = c("google", "facebook", "facebook", "askjeeves", "stackoverflow"),
sequence = c(1, 2, 1, 2, 3))
df.trans <- as(df, "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
seq <- cspade(df.trans, parameter = list(support = 0.4), control = list(verbose = TRUE))
If leave my columns as their original class as above, I get an error:
Error in asMethod(object) :
column(s) 1, 2, 3, 4 not logical or a factor. Discretize the columns first.
However, if I convert the columns to factors, I get another error:
df <- data_frame(personID = c(1, 1, 2, 2, 2),
eventID = c(100, 101, 102, 103, 104),
site = c("google", "facebook", "facebook", "askjeeves", "stackoverflow"),
sequence = c(1, 2, 1, 2, 3))
df <- as.data.frame(lapply(df, as.factor))
df.trans <- as(df, "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
seq <- cspade(df.trans, parameter = list(support = 0.4), control = list(verbose = TRUE))
Error in asMethod(object) :
In makebin(data, file) : 'eventID' is a factor
Any advice on getting around this or advice on sequence mining in R in general is much appreciated. Thanks!