1

I'm conducting some analysis on sequence data with very different lengths using TraMineR. What ends up happening is that the void elements (%) used to make the sequences equally long end up overwhelming everything else:

seqstatf(cluster1_data)

             Freq      Percent
%          377623 98.366219930
assigned       16  0.004167806
closed       1115  0.290444002
discussed    2454  0.639237291
mentioned     954  0.248505451
merged        421  0.109665403
opened        534  0.139100535
referenced    565  0.147175660
reopened       22  0.005730734
reviewed      191  0.049753188

How can I avoid this effect?

histelheim
  • 4,938
  • 6
  • 33
  • 63
  • I guess cluster_data is a state sequence object. With which options did you create it? Is the `%` in your original data ? – Gilbert Jan 28 '15 at 15:52

1 Answers1

2

The void (%) signs came from NAs in my original data.

The problem was that I used seqdef twice (first on the raw data, and then on the resulting sequence object). Somehow this negated my use of the missing=TRUE and right="DEL" flags.

Here's how I set the seqdef function to discount missing data during the analysis:

seqdef(data, right = "DEL")
histelheim
  • 4,938
  • 6
  • 33
  • 63