3

I'm trying to create a random walker on a specific transition matrix (20,000 * 20,000) and so far I'm using the igraph::random_walk() function from R's package igraph.

The thing with that function is that gets as input a graph and not the transition matrix. That means that you firstly have to convert your transition matrix into a graph, using the following command:

# Transform transition matrix into graph
g <- igraph::graph.adjacency( as.matrix(tm), mode = "directed", weighted = TRUE )

Since my transition matrix is a 20,000*20,000 matrix, the variable tm occupies around 3.1GB and the corresponding graph g occupies 13.3GB. The disadvantage of this approach is that the script full up the whole memory (32GB RAM system) and sometimes kernel (probably) kills the process.

So I was wondering if there is any other package (couldn't find anything) in R that returns a random walk on the transition matrix, without the need for conversion into a graph firstly.

J. Doe
  • 619
  • 4
  • 16
  • igraph is from sparse graphs. It is far from optimal if you have a dense graph. – Gabor Csardi Mar 04 '18 at 15:41
  • What is the aim of the random walk? If you are interested in the stationary distribution of occupancies and the graph is ergodic then there are other approaches to compute those (e.g. finding the leading eigenvector, etc.). – Paul Brodersen Mar 06 '18 at 14:20
  • I want for example to calculate how many times the random walker visited specific nodes. – J. Doe Mar 06 '18 at 19:11

1 Answers1

1

What about implementing it manually?

library(igraph)
set.seed(1)
resample <- function(x, ...) x[sample.int(length(x), ...)]
n <- 1000
tm <- matrix(sample(0:1, n^2, prob = c(0.95, 0.05), replace = TRUE), n, n)
tm <- (tm == 1 | t(tm) == 1) * 1
diag(tm) <- 0

start <- 23 # Random walk starting vertex
len <- 10 # Walk length
path <- c(start, rep(NA, len))
for(i in 2:(len + 1)) {
  idx <- tm[path[i - 1], ] != 0
  if(any(idx)) {
    path[i] <- resample(which(idx), 1, prob = tm[path[i - 1], idx])
  } else {
    break # Stopping if we get stuck
  }
}
path
#  [1]   23 3434 4908 4600  332 4266 1752 1845 4847 4817 1992
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • What do you mean exactly? What is the result? – J. Doe Mar 21 '18 at 10:35
  • @J.Doe, I simulated a 5000x5000 adjacency matrix and implemented a random walk of length 1000 starting from vertex 23; the resulting path is `path`. – Julius Vainora Mar 21 '18 at 11:42
  • Oh, sorry. I misread your initial question. I didn't check it yet. But I will today and I 'll let you know. Thank you very much though. – J. Doe Mar 21 '18 at 11:44
  • I see that your transition matrix uses 1 and 0s. Normally, transition's matrix rows sum up into 1, and the values represent the probabilities for a random walker to transmit from one node to the other. So in the first row one could have the values `0.2 0.1 0.1 0.1 0.2 0.3` for transmitting to A,B,C,D,E,F respectively and the walker should choose his next step according to these values. It should be like taking by rolling a 10 faces dice where 3 faces say F, 2 are As and 2 are Es and finally 1B, 1C and 1D. I don't know, id the sample.int function you used is working like this way. – J. Doe Mar 21 '18 at 17:03
  • @J.Doe, right, I was assuming an unweighted graph. Now the updated version allows for weights/probabilities. – Julius Vainora Mar 21 '18 at 17:08
  • @J.Doe, by the way, `resample` is simply a suggested variant of `sample` from `?sample` that deals with such problematic cases as `sample(5, 1)` returning `4`. – Julius Vainora Mar 21 '18 at 17:29