Draw markov chain given transition matrix in R

Question

Let trans_m be a n by n transition matrix of a first-order markov chain. In my problem, n is large, say 10,000, and the matrix trans_m is a sparse matrix constructed from Matrix package. Otherwise, the size of trans_m would be huge. My goal is to simulate a sequence of markov chain given a vector of initial states s1 and this transition matrix trans_m. Consider the following concrete example.

    n <- 5000 # there are 5,000 states in this case.
    trans_m <- Matrix(0, nr = n, nc = n, sparse = TRUE)
    K <- 5 # the maximal number of states that could be reached.
    for(i in 1:n){
        states_reachable <- sample(1:n, size = K) # randomly pick K states that can be reached with equal probability.
        trans_m[i, states_reachable] <- 1/K
    }
    s1 <- sample(1:n, size = 1000, replace = TRUE) # generate 1000 inital states
    draw_next <- function(s) {
        .s <- sample(1:n, size = 1, prob = trans_m[s, ]) # given the current state s, draw the next state .s
        .s
    }
    sapply(s1, draw_next)

Given the vector of initial states s1 as above, I used sapply(s1, draw_next) to draw the next state. When n is larger, sapply becomes slow. Is there a better way?

the way you operate is that you take 100 initial state (but you put `size=1000` on the line defining `s1` ...) and then you only generate for each of these state the next state. But you do not want to have a series of `m` (`m>2`) states for each of your initial steps, but only the next state, correct? — Colonel Beauvel, Jul 19 '15 at 08:00
I do not get the point. You are creating a 5000*5000 matrix but you are using only 5 states. You can thus reduce your matrix to a 5*5 matrix for your simulation. — Colonel Beauvel, Jul 19 '15 at 08:21
@ColonelBeauvel There was a typo in my original post. I intended to draw 1000 initial states, though I typed 100 in the comment. Thanks for pointing out. I cannot reduce the transition matrix to 5 by 5, because the five attainable next states change with respect to the conditioning variable (the current state). — semibruin, Jul 19 '15 at 14:19

score 1 · Accepted Answer · answered Jul 19 '15 at 17:26

Repeatedly indexing by row can be slow, so it's faster to work on the transpose of the transition matrix and use column indexing, and to factor out the indexing from the inner function:

R>    trans_m_t <- t(trans_m)
R>
R>    require(microbenchmark)
R>    microbenchmark(
+       apply(trans_m_t[,s1], 2,sample, x=n, size=1, replace=F)
+     ,
+       sapply(s1, draw_next)
+     )
Unit: milliseconds
                                                            expr        min
 apply(trans_m_t[, s1], 2, sample, x = n, size = 1, replace = F) 111.828814
                                           sapply(s1, draw_next) 499.255402
          lq        mean      median          uq        max neval
 193.1139810 190.4379185 194.6563380 196.4273105 270.418189   100
 503.7398805 512.0849013 506.9467125 516.6082480 586.762573   100

Since you're already working with a sparse matrix, you might be able to get even better performance by working directly on the triplets. Using the higher level matrix operators can trigger recompression.

Transpose is a good idea. I tried triplets, though. That is very handy. Thanks. — semibruin, Jul 19 '15 at 18:32

Draw markov chain given transition matrix in R

1 Answers1