0

I would like to know if there is a more efficiency way to speed up below code. It uses a procedure where subsampling is required in the nested loop (which a previous answer https://stackoverflow.com/a/13629611/1176697 help to make more efficient). R has a tendency to hang when B=500, although computer OS isn't unduly affected.

The goal is to run the below code with B=1000 and using larger m values(m=75,m=100,m=150)

I have detailed the procedure in the code below and included a link to a reproducible data set.

#Estimation of order m `leave one out' hyperbolic efficiency scores

#The procedure sequentially works through each observation in `IOs' and 
#calculates a DEA order M efficiency score by leaving out the observation
# under investigation in the DEA reference set

# Step 1: Load the packages, create Inputs (x) and Outputs (y), choose 
# m(the order of the partial frontier or reference set to be used in DEA)
# and B the number of monte carlo simulations of each m order DEA estimate

# Step 2: For each observations a in x, x1 and y1
#are create which 'leaves out' this observation. 

# Step 3: From these matrices subsamples (xref, yref) of size [m,] are 
# taken and used in DEA estimation.

# Step 4: The DEA estimation uses the m subsample from step 3
# as a reference set and evaluates the efficiency of the observation that
# has been 'left out'
#(thus the first two arguments in DEA are matrices of order [1,3] )

# Step 5: Steps 3 and 4 are repeated B times to obtain B simulations of the 
# order m efficiency score and a mean and standard deviation are
# calculated and placed in effm.

# IOs data can be found here: https://dl.dropbox.com/u/1972975/IOs.txt
# From IOs an Input matrix (x[1376,3]) and an Output matrix (y[1376,3])
# are created.

library(Benchmarking)
x <- IOs[,1:3]
y<-IOs[,4:6]
A<-nrow(x)
effm <- matrix(nrow = A, ncol = 2)
m <- 50
B <- 500
pb <- txtProgressBar(min = 0,
                     max = A, style=3)

for(a in 1:A) {
  x1 <- x[-a,]
  y1 <- y[-a,]
  theta <- numeric(B)
  xynrow<-nrow(x1)
  mB<-m*B 
  xrefm <- x1[sample(1:xynrow, mB, replace=TRUE),] # get all of your samples at once(https://stackoverflow.com/a/13629611/1176697)
  yrefm <- y1[sample(1:xynrow, mB, replace=TRUE),]
  deaX <- as.matrix(x[a,], ncol=3)
  deaY <-as.matrix(y[a,], ncol=3)

  for(i in 1:B){
    theta[i] <- dea(deaX, deaY, RTS = 'vrs', ORIENTATION = 'graph',
                    xrefm[(1:m) + (i-1) * m,], yrefm[(1:m) + (i-1) * m,], FAST=TRUE)
  }

  effm[a,1] <- mean(theta)
  effm[a,2] <- sd(theta) / sqrt(B)
  setTxtProgressBar(pb, a) 
}
close(pb)
Community
  • 1
  • 1
barryq
  • 185
  • 1
  • 1
  • 8
  • Session infommation:R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Benchmarking_0.21 ucminf_1.1-3 lpSolveAPI_5.5.2.0-5 loaded via a namespace (and not attached): [1] tools_2.15.2 – barryq Dec 05 '12 at 06:58
  • I have not analysed your question in details. So I am not sure if my comment will help you. I'm doing Monte Carlo simulations with help of two packages `doMC` and `foreach`. There is a function `foreach` that I'm using for nested loops (see the documentation on CRAN). This is an example for nested looping `foreach(a = 1:A, .combine = rbind, .inorder = F) %:% foreach(i = 1:B, .combine = rbind, .inorder = F) %dopar% {}` – djhurio Dec 05 '12 at 08:51
  • I think this belongs on [Code Review](http://codereview.stackexchange.com). – Roland Dec 05 '12 at 08:52
  • Forgot to say the main point. Parallel computing is an approach you should be looking for to speed it up (if you can run the iterations in parallel). – djhurio Dec 05 '12 at 09:00

0 Answers0