3

I want to make the code below more efficient by using the foreach package. I tried it for a very long time but I don't manage to get the same result as when using the for-loops. I would like to use a nested foreach-loop including parallelization... And as output I would like to have two matrices with dim [R,b1] I would be very grateful for some suggestions!!

n  <- c(100, 300, 500)
R  <- 100
b0 <- 110
b1 <- seq(0.01, 0.1, length.out = 100)

## all combinations of n and b1
grid <- expand.grid(n, b1)
names(grid) <- c("n", "b1")

calcPower <- function( R, b0, grid) {
  
  cl <- makeCluster(3)
  registerDoParallel(cl)
  
  ## n and b1 coefficients
  n  <- grid$n
  b1 <- grid$b1
  
  ## ensures reproducibility
  set.seed(2020)
  x      <- runif(n, 18, 80)
  x.dich <- factor( ifelse( x < median( x), 0, 1))
  
  ## enables to store two outputs
  solution <- list()
  
  ## .options.RNG ensures reproducibility
  res <- foreach(i = 1:R, .combine = rbind, .inorder = TRUE, .options.RNG = 666) %dorng% { 
    p.val   <- list()
    p.val.d <- list()
  
    for( j in seq_along(b1)) {
      
      y <- b0 + b1[j] * x + rnorm(n, 0, sd = 10)
      
      mod.lm   <- lm( y ~ x)
      mod.lm.d <- lm( y ~ x.dich)
      
      p.val    <- c( p.val,   ifelse( summary(mod.lm)$coef[2,4]   <= 0.05, 1, 0))
      p.val.d  <- c( p.val.d, ifelse( summary(mod.lm.d)$coef[2,4] <= 0.05, 1, 0))
    }
    
    solution[[1]] <- p.val
    solution[[2]] <- p.val.d
    
    return(solution)
    }
  
  dp.val   <- matrix( unlist(res[,1], use.names = FALSE), R, length(b1), byrow = TRUE)
  dp.val.d <- matrix( unlist(res[,2], use.names = FALSE), R, length(b1), byrow = TRUE)
  
  stopCluster(cl)

  df <- data.frame(
    effectS = b1,
    power   = apply( dp.val,   2, function(x){ mean(x) * 100}),
    power.d = apply( dp.val.d, 2, function(x){ mean(x) * 100}),
    n = factor(n))
  
  return(df)
}

## simulation for different n
tmp <- with(grid,
            by( grid, n,
               calcPower, R = R, b0 = b0))

## combines the 3 results
df.power  <- rbind(tmp[[1]], tmp[[2]], tmp[[3]])

Andrea
  • 33
  • 3

1 Answers1

1

I created a foreach loop in following code. There had to be some changes made. It is a lot easier to return a list then a matrix in foreach, since it's combined with rbind. Especially when you want to return multiple ones. My solution here is to save everything in a list and afterwards transform it into a matrix of length 100.

Note: there is one mistake in your code. summary( mod.lm.d)$coef[2,4] does not exist. I changed it to [2]. Adjust to your needing

solution <- list()
df2<-foreach(i = 1:R, .combine = rbind, .inorder=TRUE) %dopar%{
  set.seed(i)
  p.val <- list()
  p.val.d <- list()
  counter <- list()
  for( j in seq_along(b1)){
    
    x      <- sort( runif(n, 18, 80))
    x.dich <- factor( ifelse( x < median(x), 0, 1))
    y      <- b0 + b1[j] * x + rnorm( n, 0, sd = 10)
    
    mod.lm   <- lm( y ~ x)
    mod.lm.d <- lm( y ~ x.dich)
    
    p.val    <- c(p.val, ifelse( summary( mod.lm)$coef[2] <= 0.05, 1, 0))
    p.val.d  <- c(p.val.d, ifelse( summary( mod.lm.d)$coef[2] <= 0.05, 1, 0))
    counter <- c(counter, j)
  }
  solution[[1]] <- p.val
  solution[[2]] <- p.val.d
  solution[[3]] <- counter
  return(solution)
}

dp.val <- unlist(df2[,1], use.names = FALSE)
dp.val.d <-  unlist(df2[,2], use.names = FALSE)

dp.val.matr <- matrix(dp.val, R, length(b1))
dp.val.d.matr <- matrix(dp.val.d, R, length(b1))

stopCluster(cl)

for your comment:

A foreach does work with a normal for loop. Minimal reproducible example:

df<-foreach(i = 1:R, .combine = cbind, .inorder=TRUE) %dopar%{
  x <- list()
  for(j in 1:3){
    x <- c(x,j)
  }
  return(x)
}
mischva11
  • 2,811
  • 3
  • 18
  • 34
  • Now I realized that this does not what I assumed. It's not possible to use a normal for-loop within a foreach-loop. j is always = 1 and does not iterate through 1:length(b1). Just to let you know. And sorry for complaining again! – Andrea Dec 04 '20 at 20:29
  • @Andrea a `foreach` does work with a `for` loop. I inserted one example in the answer. Also I'm gonna checking my solution before, not quite sure if something went wrong. Maybe I did a mistake in joining the data. – mischva11 Dec 05 '20 at 10:55
  • @Andrea I think I made a mistake in handling your variables. I changed the code to a more saver way to deal with your problem. Also I inserted you a counter variable, you can call it with `df2[,3]`. You can see the `j` does iterate. But I recommend to remove it. It's just unnecessary timeloss. – mischva11 Dec 05 '20 at 11:15