3

Let's use the dataset from this question:

dat<-data.frame(replicate(20,sample(c("A", "B", "C","D"), size = 100, replace=TRUE)))

Then we can build the transition matrix and the markov chain:

# Build transition matrix
trans.matrix <- function(X, prob=T)
{
  tt <- table( c(X[,-ncol(X)]), c(X[,-1]) )
  if(prob) tt <- tt / rowSums(tt)
  tt
}
trans.mat <- trans.matrix(as.matrix(dat))
attributes(trans.mat)$class <- 'matrix'

# Build markovchain
library(markovchain)
chain <- new('markovchain', transitionMatrix = trans.mat)

If I now encounter a new sequence, let's say AAABCAD can I then calculate the probability of observing this sequence given this markovchain?

Eskapp
  • 3,419
  • 2
  • 22
  • 39
KingBoomie
  • 278
  • 1
  • 12

1 Answers1

4

I cannot see a function in markovchain exactly for that, but it can be easily done manually too. There's one caveat though: the transition matrix does not provide the probability of observing the first A, which needs to be provided by you. Let it be 0.25, as it would be if all four states were equally likely (which is true in your example).

Then the transitions in the observed chain can be obtained with

cbind(head(obs, -1), obs[-1])
#      [,1] [,2]
# [1,] "A"  "A" 
# [2,] "A"  "A" 
# [3,] "A"  "B" 
# [4,] "B"  "C" 
# [5,] "C"  "A" 
# [6,] "A"  "D" 

Probabilities for each of those transitions then are

trans.mat[cbind(head(obs, -1), obs[-1])]
# [1] 0.2268722 0.2268722 0.2268722 0.2926316 0.2791165 0.2665198

and the final answer is 0.25 * (the product of the above vector):

0.25 * prod(trans.mat[cbind(head(obs, -1), obs[-1])])
# [1] 6.355069e-05

For comparison, we may estimate this probability by generating many chains of length 7:

dat <- replicate(2000000, paste(sample(c("A", "B", "C", "D"), size = 7, replace = TRUE), collapse = ""))
mean(dat == "AAABCAD")
# [1] 6.55e-05

Looks close enough!

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • Thankyou Julius!, I can't apply this to a second order markov chain can I? (perhaps I should ask this in a separate question though) – KingBoomie Apr 10 '19 at 12:23
  • @KingBoomie, the logic would remain the same, but no, the same code cannot be directly applied. It's worth a separate question as it matters in what form you have the transition probabilities. – Julius Vainora Apr 10 '19 at 12:27
  • @JuliusVainora I posted my new question: https://stackoverflow.com/questions/55617539/r-build-second-order-transition-matrix-and-score-sequences ( I had to use another account ). As you pointed out the format of the transition matrix influences the way a new sequence should be scored, therefore I combined both in my question. – CodeNoob Apr 10 '19 at 16:44