Suppose I have three sequences:
dat <- list( Seq1 =c("A", "B", "C", "D", "C", "A", "C","D","A","A","B","D"),
Seq2 = c("C" ,"C" ,"B" ,"A" ,"D" ,"D" ,"A" ,"B","C","D","B","A","D"),
Seq3 = c("D" ,"A" ,"D" ,"A" ,"D", "B", "B", "A","D","A","D","A"))
these sequence are stored in three different CSV files. I want to calculate first-order markov chain from these data[aggregrated].
t=matrix(nrow = length(actionsoverall),ncol = length(actionsoverall),0)
for(i in files){
y=read.csv(i)$x
yy=as.integer(y)
for (j in 1:(length(y)-1)) {
t[yy[j],yy[t+1]]<-t[yy[j],yy[j+1]]+1
}
}
for (h in 1:length(actionsoverall)) {
t[h,]<-t[h,]/sum(t[h,])
}
Actually, I want to read the sequence from each of the files (i.e. A to B occurs 2 time from file 1, 1 time from file 2 and 3 times from file 3. A occurs total 10 times. So, the probability will be 6/10.
N.B. If I calculate the transition probability each of the file and average them. Will it be the same?