More efficient loops thru unique values in R

Question

I am relatively new to R and all its wisdom and I am trying to be more efficient with my script. I am using a loop to simulate how an animal moves among different sites. The problem that I have is that when I increase the number of sites or change the initial parameters (based on fixed probability of moving or staying in the same site) then I end with a very complicated loop. If I have to run several different simulations with different parameters, I prefer a more efficient loop or function that could adjust to different situations. The first loop will fill a matrix according to the initial probabilities and the second loop will compared the cumulative probability matrix against a random number from a list of values (10 in this example) and will decide the fate of that individual (either stay or go to a new site)

Here is a simplification of my code:

N<-4 # number of sites
sites<-LETTERS[seq(from=1,to=N)]

p.stay<-0.45
p.move<-0.4

move<-matrix(c(0),nrow=N,ncol=N,dimnames=list(c(sites),c(sites)))
from<-array(0,c(N,N),dimnames=list(c(sites),c(sites)))
to<-array(0,c(N,N),dimnames=list(c(sites),c(sites)))

# Filling matrix with fixed probability #

for(from in 1:N){
  for(to in 1:N){
     if(from==to){move[from,to]<-p.stay} else {move[from,to]<-p.move/(N-1)}
     }
 }

move
cumsum.move<-cumsum(data.frame(move))

steps<-100
result<-as.character("") # for storing results
rand<-sample(random,steps,replace=TRUE) 
time.step<-data.frame(rand)
colnames(time.step)<-c("time.step")
time.step$event<-""
to.r<-(rbind(sites))
j<-sample(1:N,1,replace=T) # first column to select (random number)
k<-sample(1:N,1,replace=T) # site selected after leaving and coming back

# Beginning of the longer loop #

for(i in 1:steps){
   if (time.step$time.step[i]<cumsum.move[1,j]){time.step$event[i]<-to.r[1]} else 
     if (time.step$time.step[i]<cumsum.move[2,j]){time.step$event[i]<-to.r[2]} else
        if (time.step$time.step[i]<cumsum.move[3,j]){time.step$event[i]<-to.r[3]} else
           if (time.step$time.step[i]<cumsum.move[4,j]){time.step$event[i]<-to.r[4]} else
              if (time.step$time.step[i]<(0.95)){time.step$event[i]<-NA} else
                 if (time.step$time.step[i]<1.0) break # break the loop             

 result[i]<-time.step$event[i]
 j<-which(to.r==result[i])
 if(length(j)==0){j<-k} # for individuals the leave and come back later

 }

 time.step

 result

This loop is part of a bigger loop that will simulate and store the result after a series of simulations. Any ideas or comments on how I can improve the efficiency of this loop so that I can easily modify the number of sites or change the initial probability parameters without repeating or having to do major edits of the loop will be appreciated.

What would you say is the range of the variable `random`? What kind of distribution will it be? In the end of your loop are you sure it's not supposed to be greater than, rather than less than 1 (or 0.9)? — Brandon Bertelsen, Oct 24 '12 at 07:52
You were right, when I changed my original code I did break it. It should be fine now. The range of the random variable should be between 0-1. The purpose of the loop is to compared the cumulative probability matrix with each random value from the variable (in this case time.step). The first column to be evaluated against this random number is also selected at random (j), then it will move from there. If the random value is between 0.95 and the last row of cumulative probabilities, then it will leave the system and come back on a future time.step. If its >0.95 it will break the loop. — user1626688, Oct 24 '12 at 08:06
Hi Brandon. Thanks for your previous comment. Any ideas o how to integrate that piece of code with the last new lines of my script? I have been trying with little success. Any ideas will be welcome (@Brandon Bertelsen). — user1626688, Oct 24 '12 at 22:18
I provided a solution to your previous version, but this edit is tricky. I don't have any good ideas. But you could likely use the same strategy of applying a function via sapply() to gain speedups. I'll undelete my previous answer so you can see what I did. — Brandon Bertelsen, Oct 25 '12 at 00:51

score 0 · Answer 1 · answered Oct 24 '12 at 04:30

I'm not sure if I'm capturing the essence of your code, but this is faster than the for loops. This started having an advantage as soon as we start getting past a few thousand steps. I replace "random" with a sample of the uniform distribution (runif())

system.time(
time.step$event <- sapply(
  time.step$time.step, 
  function(x) rownames(
    cumsum.move[which(cumsum.move[,j] > x),])[[1]]
  )
)

Here are my results @ 10,000 steps. I'm working on a laptop so 100,000 with the for loop didn't compute in under 1 minute, but sapply did it in 14 seconds.

> system.time(
+ time.step$event <- sapply(
+ time.step$time.step,
+ function(x) rownames(
+ cumsum.move[which(cumsum.move[,j] > x),])[[1]]
+ )
+ )
   user  system elapsed 
  1.384   0.000   1.387 
> head(time.step)
  time.step event
1 0.2787642     C
2 0.3098240     C
3 0.9079045     D
4 0.9904031     D
5 0.3754330     C
6 0.6984415     C
> system.time(
+ for(i in 1:steps){
+ if (time.step$time.step[i]<cumsum.move[1,j]){time.step$event[i]<-to.r[1]} else
+ if (time.step$time.step[i]<cumsum.move[2,j]){time.step$event[i]<-to.r[2]} else
+ if (time.step$time.step[i]<cumsum.move[3,j]){time.step$event[i]<-to.r[3]} else
+ if (time.step$time.step[i]<cumsum.move[4,j]){time.step$event[i]<-to.r[4]}
+ result[i]<-time.step$event[i]
+ }
+ )
   user  system elapsed 
  3.137   0.000   3.143 
> head(time.step)
  time.step event
1 0.2787642     C
2 0.3098240     C
3 0.9079045     D
4 0.9904031     D
5 0.3754330     C
6 0.6984415     C

which() isn't necessary, that shaves off a bit more time as well. The indexing takes a fair bit of time as well. I'm sure someone else could get it faster. — Brandon Bertelsen, Oct 24 '12 at 04:40
Thanks Brandon, your code was useful but I forgot to include an edit that will make it a little bit more complicated. — user1626688, Oct 24 '12 at 06:28
Hey Brandon, what exactly is (x) from the function? In the original loop I was comparing the rows from the cumulative matrix with random values, but I am not sure if this is doing the same here. Also why at the end of the code you used [[1]]? Any comments are appreciated! — user1626688, Oct 25 '12 at 05:24
x is each element in the column in this case. I used the subset, [[1]] because rownames returns a vector of the row names, but in this example we only wanted the first one. — Brandon Bertelsen, Oct 25 '12 at 08:28

More efficient loops thru unique values in R

1 Answers1