0

I have a data set with the text users wrote in a text field on a website. Due to the nature of the website most users wrote multiple times in the field. Now I want to look if there is a pattern. For instance, users who wrote at some time "A" will in later time write "B".

After some googling I found TraMineR as a library for this kind of analysis. But it seems that TraMineR and/or R sets an maximum on the number of states. Is this true or am I doing something wrong? What is the best way to approach my problem?

Some more information about my dataset:

  • There are more than a million logs of text input
  • About 90000 different users
  • About 80000 different inputs (events/states?)

To create a state sequence object of my data I need to use seqe2stm() from TraMineRextras (As explained here), where the number of my events is over 80000. Running the function gives me the error:

Error in matrix(TRUE, nrow = nbstate, ncol = nevent) :
invalid 'nrow' value (too large or NA)
In addition: Warning message:
In matrix(TRUE, nrow = nbstate, ncol = nevent) :
NAs introduced by coercion to integer range

Community
  • 1
  • 1
  • Show us what code have you tried. This is how the style of Q&A goes here... – oz123 Jan 23 '16 at 16:14
  • I don't think this is a programming issue : you should ask on CV instead. Sequence analysis is not suited for that kind of analysis. It compares sequences, and not only states ; with so many states, almost every sequence will be unique, and there will be no possible comparison. My suggestion would be to reduce the complexity beforehand (for instance through a topic model of the input that reduces the number of possible states). – scoa Jan 23 '16 at 16:23

0 Answers0