Questions tagged [sequence-analysis]

Sequence analysis (in the social sciences) is the analysis of how people or other units of study move from one state to another (for example, single-->married-->widowed, unemployed-->employed-->retired) over the course of their lifespan.

35 questions
1
vote
1 answer

Is TraMineR appropriate for data with different sequence length?

My data has the sequence of each student's page visit behaviors during a learning session. For example (below) Student 1 read instructions, visited three pages ("Visit-Visit-Visit"), and revisited one of the pages ("Revisit"). Student 2 read…
jakeM
  • 11
  • 1
1
vote
1 answer

Remove missing data state ‘%’ when using TraMineR’s seqpcplot() function

I am trying to conduct event sequence analysis on longitudinal survey data. I want to create a plot which looks like this (pg. 44 of https://www.researchgate.net/publication/279560802_Exploratory_mining_of_life_event_histories), which I believe was…
Misc584
  • 357
  • 3
  • 16
1
vote
1 answer

Sequence analysis clustering CHI2 EUCLID error

I am quite new to sequence analysis and trying to identify clusters in an aggregated sequence matrix, focusing on the state duration. However, when using method='CHI2'/'EUCLID' combined with step=1 (not otherwise) I am getting the error: Error in…
Rico
  • 69
  • 1
  • 6
1
vote
1 answer

Setting the "tpow" and "expcost" arguments in TraMineR::seqdist

I'm actually working on the pathways of inpatients during their hospital stay. These pathways are represented as states sequences (the current medical unit at each time unit) and I'm trying to find typical pathways through clustering algorithms. I…
L. Trutt
  • 13
  • 3
1
vote
1 answer

How to compute dissimilarities between sequences when sequences contain gaps?

I want to cluster sequences with optimal matching with TraMineR::seqdist() from data that contains missings, i.e. sequences containing gaps. library(TraMineR) data(ex1) sum(is.na(ex1)) # [1] 38 sq <- seqdef(ex1[1:13]) sq # Sequence …
jay.sf
  • 60,139
  • 8
  • 53
  • 110
1
vote
1 answer

How to introduce noise into sequence data using TraMineR?

I want to randomly change states in a sequence dataset for the purposes of simulation. The goal is to see how different measures of cluster quality behave with different degrees of structure in the data. If I were to introduce missings, there is the…
Kenji
  • 571
  • 4
  • 20
1
vote
1 answer

How to test if two lift values are significantly different from each other?

Consider this code: # Load libraries library(RCurl) library(TraMineR) library(PST) # Get data x <-…
histelheim
  • 4,938
  • 6
  • 33
  • 63
1
vote
0 answers

Comparing log-loss values for a probabilistic suffix tree?

In the PST package one can estimate the prediction quality of individual sequences using the log-loss, e.g: R> ex2 <- c("a-a-b", "a-b-a-a-b", "b-b-b-b-a") R> ex2 <- seqdef(ex2) R> predict(S1.p1, ex2, output = "logloss") logloss [1] 0.9183 [2]…
histelheim
  • 4,938
  • 6
  • 33
  • 63
1
vote
1 answer

Meaning of lag parameter in PST?

In the pmine() function in PST you can use lags. What is this lag? Does it mean that it ignores the lag first positions in the sequence? Or does it mean that you allow for lags within the subsequences? From the documentation it is hard to understand…
histelheim
  • 4,938
  • 6
  • 33
  • 63
1
vote
1 answer

What is the meaning of alpha in the context of an information gain pruning function?

In the PST package we use the value C as a cut-off for the information gain function used to prune the tree. The C value, for an alpha of 0.05 is calculated as follows: C95 <- qchisq(0.95, 1) / 2 What does it mean that the C value is based on an…
histelheim
  • 4,938
  • 6
  • 33
  • 63
1
vote
1 answer

R -need help putting matrix into basket or transaction form

Server Epoch A B C D E 1 C301 1420100400 1 0 1 0 0 2 C301 1420100700 0 0 0 0 0 3 C301 1420152000 0 1 0 0 0 4 C301 1420238100 1 1 1 0 0 5 C301 1420324500 1 1 1 1 1 I need help getting the matrix above into basket or…
qman
  • 11
  • 4
1
vote
0 answers

Sequence Mining using arulesSequence package in R

I am trying to learn about Sequence Mining, and I ran the following code from wikibooks as an example. The cspade function has taken over 30 minutes to run (and is still running) when the example shows that it should take less than a second. Does…
1
vote
2 answers

Traminer substitution cost

I have a logical problem with the transition cost matrix. I am working on sequences dissimilarity using the R package Traminer. I try to give you a simple example (very simple, but I hope useful to explain my problem): There are three sequences and…
Giampiero
  • 43
  • 3
1
vote
1 answer

How to address void elements overwhelming analysis?

I'm conducting some analysis on sequence data with very different lengths using TraMineR. What ends up happening is that the void elements (%) used to make the sequences equally long end up overwhelming everything else: seqstatf(cluster1_data) …
histelheim
  • 4,938
  • 6
  • 33
  • 63
1
vote
1 answer

TraMineR:::seqerules help page?

Is there a help-page for TraMineR:::seqerules? I cannot seem to find it, either in the package nor online. The lack of this help page makes the output somewhat difficult to interpret. For example what do the Conf and Lift columns specify? Below is…
histelheim
  • 4,938
  • 6
  • 33
  • 63