I have a data set with the text users wrote in a text field on a website. Due to the nature of the website most users wrote multiple times in the field. Now I want to look if there is a pattern. For instance, users who wrote at some time "A" will in later time write "B".
After some googling I found TraMineR
as a library for this kind of analysis. But it seems that TraMineR
and/or R sets an maximum on the number of states. Is this true or am I doing something wrong? What is the best way to approach my problem?
Some more information about my dataset:
- There are more than a million logs of text input
- About 90000 different users
- About 80000 different inputs (events/states?)
To create a state sequence object of my data I need to use seqe2stm()
from TraMineRextras
(As explained here), where the number of my events
is over 80000. Running the function gives me the error:
Error in matrix(TRUE, nrow = nbstate, ncol = nevent) :
invalid 'nrow' value (too large or NA)
In addition: Warning message:
In matrix(TRUE, nrow = nbstate, ncol = nevent) :
NAs introduced by coercion to integer range