1

I am trying to speed up a Monte Carlo simulation of a discrete time-inhomogeneous Markov chain using data.table or some form of parallelisation. Using random dummy transition matrices TM, I am simulating nSteps time steps in each of N simulations and starting from a an initial state vector initialState record the next updated state in currentState. At each time step, I matrix multiply the current state with the transition matrix TM.

Code 1 with loop

nStates <- 5 #number of states
initialState <- c(rep(1/nStates, nStates)) #vector with uniform initial states
nSteps <- 10 #number of time steps
N <- 10000 #number of simulations

ind.arr <- matrix(1:(N*nSteps),ncol=nSteps, byrow=TRUE)
currentState <- vector("list",(N*(nSteps))) #collects the nSteps state vectors for each simulation

system.time(
  for (i in 1:N) {
    TM <- matrix(runif(nStates^2), ncol=nStates) #random transition matrix for each time step and each simulation
    currentState[[(ind.arr[i,1])]] <- initialState %*% (TM / rowSums(TM)) #/rowSums(TM) ensures that TM is a transition matrix
    for (t in 2:nSteps){
      TM <- matrix(runif(nStates^2), ncol=nStates)
      currentState[[(ind.arr[i,t])]] <- currentState[[(ind.arr[i,t-1])]] %*% (TM / rowSums(TM))
    }
  })

The code is not super slow, but I am wondering whether avoiding the N-loop can accelerate the code. If I put the body of the N-loop in a function

statefun <- function(initialState, nSteps, nStates){
  TM <- matrix(runif(nStates^2), ncol=nStates) #random transition matrix for each time step and each simulation
  currentState <- matrix(rep(NA, nSteps*nStates), ncol=nStates)
  currentState[1,] <- initialState %*% (TM / rowSums(TM)) #/rowSums(TM) ensures that TM is a transition matrix
  for (t in 2:nSteps){
    TM <- matrix(runif(nStates^2), ncol=nStates)
    currentState[t,] <- currentState[t-1,] %*% (TM / rowSums(TM))
  }
  return(currentState)
}

and use data.table, I get an error and not the desired result

library(data.table)
system.time(dt <- data.table(i=1:N)[, c("s1", "s2", "s3", "s4", "s5") := list(statefun(initialState, nSteps, nStates)), by=i])

#As each simulation run is independent and the call of statefun is expensive, I was hoping that parallelisation helps to accelerate the code, but trying foreach is actually slower than where I started.  

library(foreach)
system.time(res <- foreach(i=1:N, .combine='c') %do% statefun(initialState, nSteps, nStates))

I appreciate any comment on how to make data.table work or using parallelisation in this case. Many thanks, Tim

@ EDIT:This one doesn't pick up the ten row output of the function call...

system.time( #does not work 
  dt <- data.table(i=1:N)[,c("s1", "s2", "s3", "s4", "s5"):=as.list(statefun(initialState, nSteps, nStates)),by=i]
)
Tim_Utrecht
  • 1,459
  • 6
  • 24
  • 44
  • you may try to use `RCUDA` or something of a kind to make computations with GPU instead of CPU. On large matrix operations it gives magnificent speed boost – inscaven May 22 '15 at 13:54
  • The `foreach` looks fine to me, though `.combine=c` gives a vector, so maybe you want `rbind` or `list` (and then to bind afterwards). `data.table` with `by` will not beat parallelization and storage in a data.table is best for mixed data formats with some grouping variables; while sticking with matrices makes more sense for your case. If you want to assign to a data.table, maybe do so after finishing the simulation. – Frank May 22 '15 at 15:17
  • @Tim_Utrecht created a post in which i would like to speed up the transitions. Can this be done? https://stackoverflow.com/questions/72443926/transition-probability-with-zero-frequency – Rstudent Jun 06 '22 at 14:44
  • @Frank I created a post in which I would like to speed up the transitions. Can this be done? https://stackoverflow.com/questions/72443926/transition-probability-with-zero-frequency – Rstudent Jun 06 '22 at 14:45

2 Answers2

2

If you convert the outer for loop into a foreach loop with 10,000 tasks, the performance isn't good because the tasks are too small. It's often better to make the number of tasks equal to the number of workers. Here's a simple way to do that using the idiv function from the iterators package:

library(doParallel)
nw <- 4
cl <- makePSOCKcluster(nw)
registerDoParallel(cl)
nStates <- 5
initialState <- c(rep(1/nStates, nStates))
nSteps <- 10
N <- 10000

currentState <- foreach(n=idiv(N, chunks=nw), .combine='c') %dopar% {
  ind.arr <- matrix(1:(n * nSteps), ncol=nSteps, byrow=TRUE)
  cur <- vector("list", n * nSteps)
  for (i in 1:n) {
    TM <- matrix(runif(nStates^2), ncol=nStates)
    cur[[ind.arr[i,1]]] <- initialState %*% (TM / rowSums(TM))
    for (t in 2:nSteps) {
      TM <- matrix(runif(nStates^2), ncol=nStates)
      cur[[(ind.arr[i,t])]] <-
          cur[[(ind.arr[i,t-1])]] %*% (TM / rowSums(TM))
    }
  }
  cur
}

Instead of simply parallelizing the outer for loop, this adds a foreach loop around a smaller version of the sequential code. So if you figure out a way to improve the sequential code you can easily use that in the parallel version. You may also get better performance by not returning all of the intermediate states.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • This is great, thanks. Speedup on my laptop about 60% and on another PC even 300%! Very much appreciated – Tim_Utrecht May 29 '15 at 10:22
  • @steve-weston I created a post in which i would like to speed up the transitions. Can this be done? https://stackoverflow.com/questions/72443926/transition-probability-with-zero-frequency – Rstudent Jun 06 '22 at 14:42
0

There is an example in this thread that could suit your needs. You would need to use replicate, from the lapply function in base.

 replicate(N, statefun(initialState, nSteps, nStates))
Community
  • 1
  • 1
erasmortg
  • 3,246
  • 1
  • 17
  • 34
  • 1
    I guess you don't have comment privileges yet, but fyi, this is better suited to a comment than an answer. – Frank May 22 '15 at 14:04
  • 1
    Yes, this was exactly my problem, but I figured it was the best option to raise awareness of that thread (seems to be a very similar issue). – erasmortg May 22 '15 at 14:05