Parallel Processing packages in R with user function and multiple outcomes

Question

I'm working on trying to make my model fitting procedure in R more efficient. Currently, I have all of my data generated with 1500 sims for 15 variables. This data is stored in an array, with each level being one sim, each row being one "person" and each column being one of the 15 variables (eg., 300 x 15 x 1500). I then pass one layer of the array through mplusObject numerous times, fitting different LPA models (one class, two class, etc). For each of these models, there are numerous outcomes that get reported and saved. I've been working for a while now trying to figure out how to speed this up using parallel processing given that the data is pre-generated and one layer of the array doesn't depend on the other. I'll show what I currently have below, but it isn't working, so I'm wondering if I need a different package. Thanks!

inp <- array(1:(300*15*1500), dim=(300,15,1500)) #Really there's actual data here, not random values, but the data generation process is a whole other thing. 
results <- results = matrix(NA,1500,129) #A results table for values to be written to, filled with NAs, 1500 simulations, 129 results.   
num_sims=1500

foreach(i=1:num_sims, .packages=c('mclust','MplusAutomation')) %dopar% {
     working <- inp[,,i]
     sim_num=i
     results[sim_num,1] = working[1,17] #number of groups
     results[sim_num,2] = working[1,18] #sample size 1
     results[sim_num,3] = working[1,19] #sample size 2
     results[sim_num,4] = working[1,20] #sample size 3
     results[sim_num,5] = working[1,21] #dist2
     results[sim_num,6] = working[1,22] #dist3
     df <- as.data.frame(working[,1:15])
  
     lpa1_15 <- mplusObject(
     TITLE = "1-Class LPA;",
     VARIABLE = "USEVARIABLES = x01-x15;
     CLASSES=c(1);",
     ANALYSIS = "ESTIMATOR = MLR;
     TYPE=MIXTURE;",
     MODEL = "
     %OVERALL%
     x01-x15;
     [x01-x15];
     %c#1%
     x01-x15;
     [x01-x15];",
     usevariables = c("x01", "x02", "x03", "x04", "x05",
                      "x06", "x07", "x08", "x09", "x10",
                      "x11", "x12", "x13", "x14", "x15"),
     rdata = df)
  
   lpa1_15_fit = mplusModeler(lpa1_15, "df.dat", modelout = "lpa1_15.inp", killOnFail = FALSE, run = 1L)
  
   if (!is.null(lpa1_15_fit$results$summaries$LL)){
     results[sim_num,7]  = -2 * lpa1_15_fit$results$summaries$LL
     results[sim_num,8]  =      lpa1_15_fit$results$summaries$BIC
     results[sim_num,9]  =      lpa1_15_fit$results$summaries$aBIC
     results[sim_num,10]  =     lpa1_15_fit$results$summaries$AIC
     results[sim_num,11]  =     lpa1_15_fit$results$summaries$AICC}
   
lpa2_15 <- mplusObject(
     TITLE = "2-Class LPA;",
     VARIABLE = "USEVARIABLES = x01-x15;
     CLASSES=c(2);",
     ANALYSIS = "ESTIMATOR = MLR;
     TYPE=MIXTURE;",
     MODEL = "
     %OVERALL%
     x01-x15;
     [x01-x15];
     %c#1%
     x01-x15;
    [x01-x15];
     %c#2%
     x01-x15;
     [x01-x15];",
     OUTPUT = "TECH11;",
     usevariables = c("x01", "x02", "x03", "x04", "x05",
                      "x06", "x07", "x08", "x09", "x10",
                      "x11", "x12", "x13", "x14", "x15"),
     rdata = df)
  
   lpa2_15_fit = mplusModeler(lpa2_15, "df.dat", modelout = "lpa2_15.inp", killOnFail = FALSE, run = 1L)
  
   if (!is.null(lpa2_15_fit$results$summaries$LL)){
     results[sim_num,12]  = -2 * lpa2_15_fit$results$summaries$LL
     results[sim_num,13]  =      lpa2_15_fit$results$summaries$BIC
     results[sim_num,14]  =      lpa2_15_fit$results$summaries$aBIC
     results[sim_num,15]  =      lpa2_15_fit$results$summaries$AIC
     results[sim_num,16]  =      lpa2_15_fit$results$summaries$AICC
     results[sim_num,17]  =      lpa2_15_fit$results$summaries$Entropy
   if (!is.null(lpa2_15_fit$results$summaries$T11_VLMR_2xLLDiff)){
     results[sim_num,18]  =      lpa2_15_fit$results$summaries$T11_VLMR_2xLLDiff
     results[sim_num,19]  =      lpa2_15_fit$results$summaries$T11_VLMR_PValue
     results[sim_num,20]  =      lpa2_15_fit$results$summaries$T11_LMR_Value
     results[sim_num,21]  =      lpa2_15_fit$results$summaries$T11_LMR_PValue}
   
... and so on...
}

The results I got from running this were:

[[1]]
[1] 0.491

[[2]]
[1] 0.7037

I've tried using parallel, foreach and dopar, and parLapply, but just can't get them to work. The closest I got was using the foreach function, but that returned a single value for each and none of the results were saved to the results table. I can provide the code for how I attempted these, but none of them worked really, so at this point I'm questioning if it can be done (and if so, which method/approach is best for this setup).

I should also point out that the levels of data can be run in any order (eg., [,,1], [,,5], [,,3]) is okay, but once that level is called the full function (or however it should be set up) should be run, as several tests compare the current model to the previous model (3 classes vs 2 classes) for that dataset, so in that sense the data does have to be run in order.

Thanks for any help or suggestions you might have!

lol still new to this. Do I add it here or edit the original post? — PSB, Oct 24 '21 at 22:01
You need ans <- foreach(... `foreach` is not like a for loop, it's more like an lapply loop. — Brian Montgomery, Oct 24 '21 at 22:27
So the function can only return a single value? or can I coerce it to return the 129 values I want? so I'd get something like [[1]] [1] # # # # # ... 129th # — PSB, Oct 24 '21 at 22:31
It can return a single object. Look at the `.combine` argument. The object can be a large list or data.frame if you want. — Brian Montgomery, Oct 24 '21 at 23:19

Parallel Processing packages in R with user function and multiple outcomes

0 Answers0